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TITLE OF INVENTION 



HIGH MOLECULAR WEIGHT SURFACE PROTEINS 
OP NON-TYPEABLE HAEMOPHILUS 



FIELD OF INVENTION 
5 This invention relates to high molecular weight 

proteins of non-typeable haemophilus. 

BACKGROUND TO THE INVENTION 
Non-typeable Haemophilus influenzae are non- 
encapsulated organisms that are defined by their lack of 

10 reactivity with antisera against known fi. influenzae 

capsular antigens* 

These organisms commonly inhabit the upper 
respiratory tract of humans and are frequently 
responsible for infections , such as otitis media , 
15 sinusitis, conjunctivitis, bronchitis and pneumonia. 

Since these organisms do not have a polysaccharide 
capsule, they are not controlled by the present 
Haemophilus influenzae type b (Hib) vaccines, which are 
directed towards Hib bacterial capsular polysaccharides. 
20 The non-typeable strains, however, do produce surface 
antigens that can elicit bactericidal antibodies. Two of 
the major outer membrane proteins, P2 and P6, have been 
identified as targets of human serum bactericidal 
activity. However, it has been shown that the P2 protein 
25 sequence is variable, in particular in the non-typeable 
Haemophilus strains. Thus, a P2 -based vaccine would not 
protect against all strains of the organism. 

There have previously been identified by Barenkamp 
et al f Pediatr. Infect. Pis. J. . 9:333-339, 1990) a group 
30 of high-molecular-weight (HMW) proteins that appeared to 
«< be major targets of antibodies present in human 

convalescent sera. Examination of a series of middle ear 
* isolates revealed the presence of one or two such 

proteins in most strains. However, prior to the present 
35 invention, the structures of these proteins were unknown 
as were pure isolates of such proteins. 

SUBSTITUTE SHEET (RULE 26) 
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SUMMARY OF INVENTION 

The inventors, in an effort to further characterize 
the high molecular weight (HMW) Haemophilus proteins, 
have cloned, expressed and sequenced the genes coding for 
two immunodominant HMW proteins (designated HMW1 and 
HMW2) from a prototype non-typeable Haemophilus strain 
and have cloned, expressed and almost completely 
sequenced the genes coding for two additional 
immunodominant HMW proteins (designated HMW3 and HMW4) 
from another non-typeable Haemophilus strain. 

In accordance with one aspect of the present 
invention, therefore, there is provided an isolated and 
purified gene coding for a high molecular weight protein 
of a non-typeable Haemophilus strain, particularly a gene 
coding for protein HMW1, HMW2, HMW3 or HMW 4 , as well as 
any variant or fragment of such protein which retains the 
immunological ability to protect against disease caused 
by a non-typeable Haemophilus strain. In another aspect, 
the invention provides a high molecular weight protein of 
non-typeable Haemophilus influenzae which is encoded by 
these genes. 

BRIEF DESCRIPTION OF DRAWINGS 

Figure 1 is a DNA sequence of a gene coding for 
protein HMW1 (SEQ ID NO: 1) ; 

Figure 2 is a derived amino acid sequence of protein 
HMW1 (SEQ ID NO: 2) ; 

Figure 3 is a DNA sequence of a gene coding for 
protein HMW2 (SEQ ID NO: 3) ; 

Figure 4 is a derived amino acid sequence of HMW2 
(SEQ ID NO: 4) ; 

Figure 5A shows restriction maps of representative 
recombinant phages which contained the HMWl or HMW2 
structural genes, the locations of the structural genes 
being indicated by the shaded bars; 

Figure 5B shows the restriction map of the T7 
expression vector pT7-7; 



WO 94/21290 



PCT/US94/02550 



3 

Figure 6 contains the DNA sequence of a gene cluster 
for the hmwl gene (SEQ ID NO: 5) , comprising nucleotides 
351 to 4958 (ORF a) (as in Figure 1) , as well as two 
additional downstream genes in the 3' flanking region, 
5 comprising ORFs b, nucleotides 5114-6748 and c 
nucleotides 7062-9011; 

Figure 7 contains the DNA sequence of a gene cluster 
for the hmw2 gene (SEQ ID NO: 6) , comprising nucleotides 
792 to 5222 (ORF a) (as in Figure 3) , as well as two 
10 additional downstream genes in the 3' flanking region, 
comprising ORFs b, nucleotides 5375-7009, and £, 
nucleotides 7249-9198; 

Figure 8 is a partial DNA sequence of a gene coding 
for protein HMW3 (SEQ ID NO: 7) ; 
15 Figure 9 is a partial DNA sequence of a gene coding 

for protein HMW4 (SEQ ID NO: 8) ; and 

Figure 10 is a comparison table for the derived 
amino acid sequence for proteins HMW1, HMW2, HMW3 and 
HMW4. 

20 GENERAL DESCRIPTION OF INVENTION 

The DNA sequences of the genes coding for HMW1 and 
HMW2, shown in Figures 1 and 3 respectively, were shown 
to be about 80% identical, with the first 1259 base pairs 
of the genes being identical. The derived amino acid 

25 sequences of the two HMW proteins , shown in Figures 2 and 

4 respectively, are about 70% identical. Furthermore, 
the encoded proteins are antigenically related to the 
filamentous hemagglutinin surface protein of Bordetella 
pertussis . A monoclonal antibody prepared against 

30 filamentous hemagglutinin (FHA) of Bordetella pertussis 

was found to recognize both of the high molecular weight 
proteins. This data suggests that the HMW and FHA 
proteins may serve similar biological functions. The 
derived amino acid sequences of the HMW1 and HMW2 

35 proteins show sequence similarity to that for the FHA 
protein. It has further been shown that these 
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antigenically-related proteins are produced by the 
majority of the non-typeable strains of Haemophilus . 
Antisera raised against the protein expressed by the HMW1 
gene recognizes both the HMW2 protein and the 
5 pertussis FHA. The present invention includes an 
isolated and purified high molecular weight protein of 
non-typeable haemophilus which is antigenically related 
to the B. pertussis FHA, which may be obtained from 
natural sources or produced recombinant ly. 

10 A phage genomic library of a known strain of 

non-typeable Haemophilus was prepared by standard methods 
and the library was screened for clones expressing high 
molecular weight proteins, using a high titre antiserum 
against HMW's. A number of strongly reactive DNA clones 

15 were plague-purified and sub-cloned into a T7 expression 
plasmid. It was found that they all expressed either one 
or the other of the two high-molecular-weight proteins 
designated HMW1 and HMW2, with apparent molecular weights 
of 125 and 120 kDa, respectively, encoded by open reading 

20 frames of 4.6 kb and 4.4 kb, respectively. 

Representative clones expressing either HMW1 or HMW2 
were further characterized and the genes isolated, 
purified and sequenced. The DNA sequence of HMW1 is 
shown in Figure 1 and the corresponding derived amino 

25 acid sequence in Figure 2. Similarly, the DNA sequence of 

HMW2 is shown in Figure 3 and the corresponding derived 
amino acid sequence in Figure 4. Partial purification of 
the isolated proteins and N-terminal sequence analysis 
indicated that the expressed proteins are truncated since 

30 their sequence starts at residue number 442 of both full 
length HMW1 and HMW2 gene products. 

Subcloning studies with respect to the hmwl and hmw2 
genes indicated that correct processing of the HMW 
proteins required the products of additional downstream 

35 genes. It has been found that both the hmwl and hmw2 

genes are flanked by two additional downstream open 
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reading frames (ORFs) , designated fe and £, respectively, 
(see Figures 6 and 7) • 

The ORFs are 1635 bp in length, extending from 
nucleotides 5114 to 6748 in the case of hmvl and 
5 nucleotides 5375 to 7009 in the case of hmv2 . with their 
derived amino acid sequences 99% identical. The derived 
amino acid sequences demonstrate similarity with the 
derived amino acid sequences of two genes which encode 
proteins required for secretion and activation of 

10 hemolysins of P. mirabilis and S . marcescens . 

The c ORFs are 1950 bp in length, extending from 
nucleotides 7062 to 9011 in the case of hmwl and 
nucleotides 7249 to 9198 in the case of hmw2 . with their 
derived amino acid sequences 96% identical. The hmwl £ 

15 ORF is preceded by a series of 9 bp direct tandem 
repeats. In plasmid subclones, interruption of the hmwl 
b or £ ORF results in defective processing and secretion 
of the hmwl structural gene product. 

The two high molecular weight proteins have been 

20 isolated and purified and shown to be partially 

protective against otitis media in chinchillas and to 
function as adhesins. These results indicate the 
potential for use of such high molecular proteins and 
structurally-related proteins of other non-typeable 

25 strains of Haemophilus influenzae as components in non- 
typeable Haemophilus influenzae vaccines. 

Since the proteins provided herein are good 
cross-reactive antigens and are present in the majority 
of non-typeable Haemophilus strains, it is evident that 

30 these HMW proteins may become integral constituents of a 
universal Haemophilus vaccine. Indeed, these proteins 
may be used not only as protective antigens against 
otitis, sinusitis and bronchitis caused by the 
non-typeable Haemophilus strains, but also may be used as 

35 carriers for the protective Hib polysaccharides in a 
conjugate vaccine against meningitis. The proteins also 
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may be used as carriers for other antigens , haptens and 
polysaccharides from other organisms , so as to induce 
immunity to such antigens, haptens and polysaccharides. 
The nucleotide sequences encoding two high molecular 
5 weight proteins of a different non-typeable Haemophilus 
strain (designated HMW3 and HMW4) have been largely 
elucidated, and are presented in Figures 8 and 9. HMW3 
has an apparent molecular weight of 125 JcDa while HMW4 
has an apparent molecular weight of 123 kDa. These high 

10 molecular weight proteins are antigenically related to 
the HMW1 and HMW2 proteins and to FHA. Sequence analysis 
of HMW3 is approximately 85% complete and of HMW4 95% 
complete, with short stretches at the 5 '-ends of each 
gene remaining to be sequenced. 

15 Figure 10 contains a multiple sequence comparison of 

the derived amino acid sequences for the four high 
molecular weight proteins identified herein. As may be 
seen from this comparison, stretches of identical peptide 
sequence may be found throughout the length of the 

20 comparison, with HMW3 more closely resembling HMW1 and 
HMW4 more closely resembling HMW2. This information is 
highly suggestive of a considerable sequence homology 
between high molecular weight proteins from various non- 
typeable Haemophilus strains. 

25 In addition, mutants of non-typeable H. influenzae 

strains that are deficient in expression of HMW1 or HMW2 
or both have been constructed and examined for their 
capacity to adhere to cultured human epithelial cells. 
The hmwl and hmw2 gene clusters have been expressed in L 

3 0 coli and have been examined for in vitro adherence. The 

results of such experimentation demonstrate that both 
HMW1 and HMW2 mediate attachment and hence are adhesins 
and that this function is present even in the absence of 
other H. influenzae surface structures. 

35 With the isolation and purification of the high 

molecular weight proteins, the inventors are able to 
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determine the major protective epitopes by conventional 
epitope mapping and synthesize peptides corresponding to 
these determinants to be incorporated in fully synthetic 
or recombinant vaccines. Accordingly, the invention also 
5 comprises a synthetic peptide having an amino acid 

sequence corresponding to at least one protective epitope 
of a high molecular weight protein of a non-typeable 
Haemophilus influenzae . Such peptides are of varying 
length that constitute portions of the high* 
10 molecular-weight proteins, that can be used to induce 
immunity, either directly or as part of a conjugate, 
against the relative organisms and thus constitute 
vaccines for protection against the corresponding 
diseases. 

15 The present invention also provides any variant or 

fragment of the proteins that retains the potential 
immunological ability to protect against disease caused 
by non-typeable Haemophilus strains. The variants may be 
constructed by partial deletions or mutations of the 

2 0 genes and expression of the resulting modified genes to 
give the protein variations. 

EXAMPLES 

Example 1 : 

Non-typeable H. influenzae strains 5 and 12 were 
25 isolated in pure culture from the middle ear fluid of 
children with acute otitis media. Chromosomal DNA from 
strain 12, providing genes encoding proteins HHW1 and 
HMW2, was prepared by preparing Sau3A partial restriction 
digests of chromosomal DNA and fractionating on sucrose 
30 gradients. Fractions containing DNA fragments in the 9 
to 20 kbp range were pooled and a library was prepared by 
ligation into AEMBL3 arms. Ligation mixtures were 
packaged in vitro and plate-amplified in a P2 lysogen of 
E. coli LE392. 

35 For plasmid subcloning studies, DNA from a 

representative recombinant phage was subcloned into the 
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T7 expression plasmid pT7-7, containing the T7 RNA 
polymerase promoter $10 , a ribosome-binding site and the 
translational start site for the T7 gene 10 protein 
upstream from a multiple cloning site (see Figure 5B) . 

DNA sequence analysis was performed by the dideoxy 
method and both strands of the HMW1 gene and a single 
strand of the HMW2 gene were sequenced. 

Western immunoblot analysis was performed to 
identify the recombinant proteins being produced by 
reactive phage clones. Phage lysates grown in LE392 
cells or plaques picked directly from a lawn of LE392 
cells on YT plates were solubilized in gel 
electrophoresis sample buffer prior to electrophoresis. 
Sodium dodecyl sulfate (SDS) -polyacrylamide gel 
electrophoresis was performed on 7.5% or 11% 
polyacrylamide modified Laemmli gels. After transfer of 
the proteins to nitrocellulose sheets, the sheets were 
probed sequentially with an E . col i -absorbed human serum 
sample containing high-titer antibody to the high- 
molecular-weight proteins and then with alkaline 
phosphatase-conjugated goat anti-human immunoglobulin G 
(IgG) second antibody. Sera from healthy adults contains 
high-titer antibody directed against surface-exposed 
high-molecular-weight proteins of non-typeable H. 
influenzae . One such serum sample was used as the 
screening antiserum after having been extensively 
absorbed with LE392 cells. 

To identify recombinant proteins being produced by 
E. coli transformed with recombinant plasmids, the 
plasmids of interest were used to transform E. coli BL21 
(DE3)/pLysS. The transformed strains were grown to an 
A 600 °f 0.5 in L broth containing 50 /ig of ampicillin per 
ml. IPTG was then added to 1 mM. One hour later, cells 
were harvested, and a sonicate of the cells was prepared. 
The protein concentrations of the samples were determined 
by the bicinchoninic acid method. Cell sonicates 
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containing 100 /xg of total protein were solubilized in 
electrophoresis sample buffer, subjected to SDS- 
polyacryl amide gel electrophoresis, and transferred to 
nitrocellulose. The nitrocellulose was then probed 
5 sequentially with the E. coli -absorbed adult serum sample 
and then with alkaline phosphatase-conjugated goat anti- 
human IgG second antibody. 

Western immunoblot analysis also was performed to 
determine whether homologous and heterologous non- 
10 typeable H. influenzae strains expressed high -molecular- 
weight proteins antigenically related to the protein 
encoded by the cloned HMW1 gene (rHMWl) . Cell sonicates 
of bacterial cells were solubilized in electrophoresis 
sample buffer, subjected to SDS-polyacrylamide gel 
15 electrophoresis, and transferred to nitrocellulose. 
Nitrocellulose was probed sequentially with polyclonal 
rabbit rHMWl antiserum and then with alkaline 
phosphatase-conjugated goat anti-rabbit IgG second 
antibody . 

20 Finally, Western immunoblot analysis was performed 

to determine whether non-typeable Haemophilus strains 
expressed proteins antigenically related to the 
filamentous hemagglutinin protein of Bordetella 
pertussis . Monoclonal antibody X3C, a murine 

25 immunoglobulin G (IgG) antibody which recognizes 
filamentous hemagglutinin, was used to probe cell 
sonicates by Western blot. An alkaline phosphatase- 
conjugated goat anti-mouse IgG second antibody was used 
for detection. 

3 0 To generate recombinant protein antiserum, E. coli 

BL21 (DE3)/pLysS was transformed with pHMWl-4, and 
expression of recombinant protein was induced with IPTG, 
as described above. A cell sonicate of the bacterial 
cells was prepared and separated into a supernatant and 

35 pellet fraction by centrifugation at 10,000 x g for 30 
min. The recombinant protein fractionated with the 
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pellet fraction. A rabbit was subcutaneously immunized 
on biweekly schedule with 1 mg of protein from the pellet 
fraction, the first dose given with Freund's complete 
adjuvant and subsequent doses with Freund's incomplete 
5 adjuvant. Following the fourth injection, the rabbit was 
bled. Prior to use in the Western blot assay, the 
antiserum was absorbed extensively with sonicates of the 

host £• coli strain transformed with cloning vector 

alone. 

10 To assess the sharing of antigenic determinants 

between HMW1 and filamentous hemagglutinin, enzyme-linked 
immunosorbent assay (ELISA) plates (Costar, Cambridge, 
Mass.) were coated with 60 ^1 of a 4-ug/ml solution of 
filamentous hemagglutinin in Dulbecco's phosphate- 
15 buffered saline per well for 2 h at room temperature. 

Wells were blocked for 1 h with 1% bovine serum albumin 
in Dulbecco's phosphate -buffered saline prior to addition 
of serum dilutions. rHMWl antiserum was serially diluted 
in 0.1% Brij (Sigma, St. Louis, Mo.) in Dulbecco's 
20 phosphate-buffered saline and incubated for 3 h at room 
temperature. After being washed, the plates were 
incubated with peroxidase-conjugated goat anti-rabbit lgG 
antibody (Bio-Rad) for 2 h at room temperature and subse- 
quently developed with 2 , 2 ' -az ino-bis ( 3 - 
25 ethylbenzthiazoline-6-sulfonic acid) (Sigma) at a 
concentration of 0.54 in mg/ml in 0.1 M sodium citrate 
buffer, pH 4.2, containing 0.03% H z 0 2 . Absorbances were 
read on an automated ELISA reader. 

Recombinant phage expressing HMW1 or HMW2 were 
30 recovered as follows. The non-typeable H. influent 
strain 12 genomic library was screened for clones 
expressing high-molecular-weight proteins with an E. 
£oli-absorbed human serum sample containing a high titer 
of antibodies directed against the high-molecular-weight 
35 proteins. 
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Numerous strongly reactive clones were identified 
along with more weakly reactive ones. Twenty strongly 
reactive clones were plaque-purified and examined by 
Western blot for expression of recombinant proteins. 
5 Each of the strongly reactive clones expressed one of two 
types of high-molecular-weight proteins, designated HMW1 
and HMW2 . The major immunoreactive protein bands in the 
HMW1 and HMW2 lysates migrated with apparent molecular 
masses of 125 and 120 kDa, respectively. In addition to 

10 the major bands, each lysate contained minor protein 
bands of higher apparent molecular weight. Protein bands 
seen in the HMW2 lysates at molecular masses of less than 
120 kDa were not regularly observed and presumably 
represent proteolytic degradation products. Lysates of 

15 LE392 infected with the AEMBL3 cloning vector alone were 

non-reactive when immunologically screened with the same 
serum sample* Thus, the observed activity was not due to 
cross-reactive E. coli proteins or JLEMBL3 -encoded pro- 
teins. Furthermore, the recombinant proteins were not 

20 simply binding immunoglobulin nonspecif ically, since the 
proteins were not reactive with the goat anti-human IgG 
conjugate alone, with normal rabbit sera, or with serum 
from a number of healthy young infants. 

Representative clones expressing either the HMW1 or 

25 HMW2 recombinant proteins were characterized further. 

The restriction maps of the two phage types were 
different from each other, including the regions encoding 
the HMW1 and HMW2 structural genes. Figure 5A shows 
restriction maps of representative recombinant phage 

30 which contained the HMW1 or HMW2 structural genes. The 
locations of the structural genes are indicated by the 
shaded bars. 

HMW1 plasmid subclones were constructed by using the 
T7 expression plasmid T7-7 (Fig. 5A and B) . HMW2 plasmid 
35 subclones also were constructed, and the results with 
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these latter subclones were similar to those observed 
with the HMW1 constructs. 

The approximate location and direction of 
transcription of the HMW1 structure gene were initially 
5 determined by using plasmid pHMWl (Fig. 5A) . This 
plasmid was constructed by inserting the 8.5-kb BapiH I- 
Sall fragment from AHMW1 into BamH I- and Sai l-cut pT7-7. 

IL coli transformed with pHMWl expressed an 

immunoreactive recombinant protein with an apparent 
10 molecular mass of 115 kDa, which was strongly inducible 
with IPTG. This protein was significantly smaller than 
the 125-kDa major protein expressed by the parent phage, 
indicating that it either was being expressed as a fusion 
protein or was truncated at the carboxy terminus. 
15 To more precisely localize the 3' end of the 

structural gene, additional plasmids were constructed 
with progressive deletions from the 3' end of the pHMWl 
construct. Plasmid pHMWl-1 was constructed by digestion 
of pHMWl with PstX, isolation of the resulting 8.8-kb 
20 fragment, and religation. Plasmid pHMWl-2 was 

constructed by digestion of pHMWl with Eindlll, isolation 
of the resulting 7.5-kb fragment, and religation. E. 
coli transformed with either plasmid pHMWl-1 or pHMWl-2 
also expressed an immunoreactive recombinant protein with 
25 an apparent molecular mass of 115 kDa. These results 

indicated that the 3' end of the structural gene was 5' 
of the Hindlll site. 

To more precisely localize the 5' end of the gene, 
plasmids pHMWl-4 and pHMWl-7 were constructed. Plasmid 
3 0 pHMWl-4 was constructed by cloning the 5 . 1-kb Bam HI- 

fiindlll fragment from AHMW1 into a pT7-7-derived plasmid 
containing the upstream 3.8-kb EcoRI - BamHi fragment. E. 
coli transformed with pHMWl-4 expressed an immunoreactive 
protein with an apparent molecular mass of approximately 
35 160 kDa. Although protein production was inducible with 

IPTG, the levels of protein production in these 
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trans formants were substantially lower than those with 
the pHMWl-2 transf onnants described above • Plasmid 
pHMWl-7 was constructed by digesting pHMWl-4 with Nde l 
and Spel. The 9.0-kbp fragment generated by this double 
5 digestion was isolated, blunt ended, and religated. E. 
coli transformed with pHMWl-7 also expressed an 
immunoreactive protein with an apparent molecular mass of 
160 kDa, a protein identical in size to that expressed by 
the pHMWl-4 transf onnants. The result indicated that the 
10 initiation codon for the HMW1 structural gene was 3' of 
the Spe l site. DNA sequence analysis confirmed this 
conclusion. 

As noted above, the JLHMW1 phage clones expressed a 
major immunoreactive band of 125 kDa, whereas the HMW1 

15 plasmid clones pHMWl-4 and pHMWl-7, which contained what 
was believed to be the full-length gene, expressed an 
immunoreactive protein of approximately 160 kDa. This 
size discrepancy was disconcerting. One possible 
explanation was that an additional gene or genes 

20 necessary for correct processing of the HMW1 gene product 
were deleted in the process of subcloning. To address 
this possibility, plasmid pHMWl-14 was constructed. This 
construct was generated by digesting pHMWl with Nde l and 
Mlul and inserting the 7.6-kbp Ndel-Mlul fragment 

25 isolated from pHMWl-4. Such a construct would contain 
the full-length HMW1 gene as well as the DNA 3' of the 
HMW1 gene which was present in the original HMW1 phage. 
E. coli transformed with this plasmid expressed major 
immunoreactive proteins with apparent molecular masses of 

30 125 and 160 kDa as— well as additional degradation 
products. The 125- and 160-kDa bands were identical to 
the major and minor immunoreactive bands detected in the 
HMW1 phage lysates. Interestingly, the pHMWl-14 
construct also expressed significant amounts of protein 

35 in the uninduced condition, a situation not observed with 
the earlier constructs. 
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The relationship between the 125- and 160-kDa 
proteins remains somewhat unclear. Sequence analysis, 
described below, reveals that the HMW1 gene would be 
predicted to encode a protein of 159 kDa. It is believed 
5 that the 160-kDa protein is a precursor form of the 
mature 125-kDa protein, with the conversion from one 
protein to the other being dependent on the products of 
the two downstream genes. 

Sequence analysis of the HMW1 gene (Figure 1) 

10 revealed a 4,608-bp open reading frame (ORF) , beginning 
with an ATG codon at nucleotide 351 and ending with a TAG 
stop codon at nucleotide 4959. A putative ribosome- 
binding site with the sequence AGGAG begins 10 bp up- 
stream of the putative initiation codon. Five other in- 

15 frame ATG codons are located within 250 bp of the 
beginning of the ORF, but none of these is preceded by a 
typical ribosome-binding site. The 5 '-flanking region of 
the ORF contains a series of direct tandem repeats , with 
the 7 -bp sequence ATCTTTC repeated 16 times. These 

20 tandem repeats stop 100 bp 5' of the putative initiation 
codon. An 8 -bp inverted repeat characteristic of a rho- 
independent transcriptional terminator is present, 
beginning at nucleotide 4983, 25 bp 3' of the presumed 
translational stop. Multiple termination codons are 

25 present in all three reading frames both upstream and 
downstream of the ORF. The derived amino acid sequence 
of the protein encoded by the HMW1 gene (Figure 2) has a 
molecular weight of 159,000, in good agreement with the 
apparent molecular weights of the proteins expressed by 

30 the HMW1-4 and HMW1-7 transf ormants. The derived amino 
acid sequence of the amino terminus does not demonstrate 
the characteristics of a typical signal sequence. The 
Bam HI site used in generation of pHMWl comprises bp 1743 
through 1748 of the nucleotide sequence. The ORF 

35 downstream of the Bam HI site would be predicted to encode 
a protein of 111 kDa, in good agreement with the 115 kDa 
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estimated for the apparent molecular mass of the pHMWl- 
encoded fusion protein. 

The sequence of the HMW2 gene (Figure 3) consists of 
a 4,431-bp ORF, beginning with an ATG codon at nucleotide 
5 352 and ending with a TAG stop codon at nucleotide 4783. 
The first 1,259 bp of the ORF of the HMW2 gene are 
identical to those of the HMW1 gene. Thereafter , the 
sequences begin to diverge but are 80% identical overall. 
With the exception of a single base addition at 

10 nucleotide 93 of the HMW2 sequence , the 5 '-flanking 
regions of the HMW1 and HHW2 genes are identical for 310 
bp upstream from the respective initiation codons. Thus, 
the HMW2 gene is preceded by the same set of tandem 
repeats and the same putative ribosome-binding site which 

15 lies 5' of the HMW1 gene. A putative transcriptional 

terminator identical to that identified 3' of the HMW1 
ORF is noted, beginning at nucleotide 4804. The 
discrepancy in the lengths of the two genes is 
principally accounted for by a 186-bp gap in the HMW2 

20 sequence, beginning at nucleotide position 3839. The 
derived amino acid sequence of the protein encoded by the 
HMW2 gene (Figure 4) has a molecular weight of 155,000 
and is 71% identical with the derived amino acid sequence 
of the HMW1 gene. 

25 The derived amino acid sequences of both the HMW1 

and HMW2 genes (Figures 2 and 4) demonstrated sequence 
similarity with the derived amino acid sequence of 
filamentous hemagglutinin of Bordetella pertussis . a 
surface-associated protein of this organism. The initial 

30 and optimized TFASTA scores for the HMWl-f ilamentous 
hemagglutinin sequence comparison were 87 and 186, 
respectively, with a word size of 2. The z score for the 
comparison was 45.8. The initial and optimized TFASTA 
scores for the HMW2- filamentous hemagglutinin sequence 

35 comparison were 68 and 196, respectively. The z score 
for the latter comparison was 48.7. The magnitudes of 
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the initial and optimized TFASTA scores and the z scores 
suggested that a biologically significant relationship 
existed between the HMW1 and HMW2 gene products and 
filamentous hemagglutinin. When the derived amino acid 
5 sequences of HMW1, HMW2, and filamentous hemagglutinin 
genes were aligned and compared, the similarities were 
most notable at the amino-terminal ends of the three 
sequences. Twelve of the first 22 amino acids in the 
predicted peptide sequences were identical. In 
10 additional, the sequences demonstrated a common five- 
amino-acid stretch, Asn-Pro-Asn-Gly-Ile, and several 
shorter stretches of sequence identity within the first 
200 amino acids. 
Example 2 ; 

15 To further explore the HMW1- filamentous 

hemagglutinin relationship, the ability of antiserum 
prepared against the HMW1-4 recombinant protein (rHMWl) 
to recognize purified filamentous hemagglutinin was 
assessed. The rHMWl antiserum demonstrated ELISA 
20 reactivity with filamentous hemagglutinin in a dose- 

dependent manner. Preimmune rabbit serum had minimal 
reactivity in this assay. The rHMWl antiserum also was 
examined in a Western blot assay and demonstrated weak 
but positive reactivity with purified filamentous 
25 hemagglutinin in this system also. 

To identify the native Haemophilus protein 
corresponding to the HMW1 gene product and to determine 
the extent to which proteins antigenically related to the 
HMW1 cloned gene product were common among other non- 
30 typeable H. influenzae strains, a panel of Haemophilus 
strains was screened by Western blot with the rHMWl 
antiserum. The antiserum recognized both a 125- and a 
120-kDa protein band in the homologous strain 12, the 
putative mature protein products of the HMW1 and HMW2 
3 5 genes , respectively . 
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When used to screen heterologous non-typeable H. 
influenzae strains, rHMWl antiserum recognized high- 
molecular-weight proteins in 75% of 125 epidemiologically 
unrelated strains. In general, the antiserum reacted 
5 with one or two protein bands in the 100- to 150-kDa 
range in each of the heterologous strains in a pattern 
similar but not identical to that seen in the homologous 
strain* 

Monoclonal antibody X3C is a murine IgG antibody 

10 directed against the filamentous hemagglutinin protein of 
B« pertussis . This antibody can inhibit the binding of 
B. pertussis cells to Chinese hamster ovary cells and 
HeLa cells in culture and will inhibit hemagglutination 
of erythrocytes by purified filamentous hemagglutinin. 

15 A Western blot assay was performed in which this 
monoclonal antibody was screened against the same panel 
of non-typeable H. influenzae strains discussed above. 
Monoclonal antibody X3C recognized both the high- 
molecular-weight proteins in non-typeable H. influenzae 

20 strain 12 which were recognized by the recombinant- 
protein antiserum. In addition, the monoclonal antibody 
recognized protein bands in a subset of heterologous non- 
typeable H. influenzae strains which were identical to 
those recognized by the recombinant-protein antiserum. 

25 On occasion, the filamentous hemagglutinin monoclonal 
antibody appeared to recognize only one of the two bands 
which had been recognized by the recombinant -protein 
antiserum. Overall, monoclonal antibody X3C recognized 
high-molecular-weight protein bands identical to those 

30 recognized by the rHMWl antiserum in approximately 35% of 
our collection of non-typeable H. influenzae strains. 
Example 3 : 

Mutants deficient in expression of HMW1, MW2 or both 
proteins were constructed to examine the role of these 
35 proteins in bacterial adherence. The following strategy 
was employed. pHMWl-14 (see Example 1, Figure 5A) was 
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digested with Bam HI and then ligated to a kanamycin 
cassette isolated on a 1.3-kb BamHI fragment from pUC4K. 
The resultant plasmid (pHMWl-17) was linearized by 
digestion with Xbal and transformed into non-typeable gL. 
5 influenzae strain 12, followed by selection for kanamycin 

resistant colonies. Southern analysis of a series of 
these colonies demonstrated two populations of 
trans formants, one with an insertion in the HMWl 
structural gene and the other with an insertion in the 

10 HMW2 structural gene. One mutant from each of these 
classes was selected for further studies. 

Mutants deficient in expression of both proteins 
were recovered using the following protocol. After 
deletion of the 2.1-kb fragment of DNA between two Eco RI 

15 sites spanning the 3 '-portion of the HMWl structural gene 

in pHMW-15, the kanamycin cassette from pUC4K was 
inserted as a 1.3-kb EcoRI fragment. The resulting 
plasmid (pHMWl-16) was linearized by digestion with Xba l 
and transformed into strain 12 , followed again by 

20 selection for kanamycin resistant colonies. Southern 
analysis of a representative sampling of these colonies 
demonstrated that in seven of eight cases, insertion into 
both the HMWl and HMW2 loci had occurred. One such 
mutant was selected for further studies. 

25 To confirm the intended phenotypes, the mutant 

strains were examined by Western blot analysis with a 
polyclonal antiserum against recombinant HMWl protein. 
The parental strain expressed both the 125-kD HMWl and 
the 12 0-kD HMW2 protein. In contrast, the HMW2" mutant 

30 failed to express the 120-kD protein, and the HMWl mutant 

failed to express the 125-kD protein. The double mutant 
lacked expression of either protein. On the basis of 
whole cell lysates, outer membrane profiles, and colony 
morphology, the wild type strain and the mutants were 

35 otherwise identical with one another. Transmission 
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electron microscopy demonstrated that none of the four 
strains expressed pili. 

The capacity of wild type strain 12 to adhere to 
Chang epithelial cells was examined. In such assays, 
5 bacteria were inoculated into broth and allowed to grow 
to a density of ~2 x 10 9 cfu/ml. Approximately 2 x 10 7 
cfu were inoculated onto epithelial cell monolayers, and 
plates were gently centrifuged at 165 x g for 5 minutes 
to facilitate contact between bacteria and the epithelial 

10 surface. After incubation for 30 minutes at 37 *C in 5% 
C0 2 , monolayers were rinsed 5 times with PBS to remove 
nonadherent organisms and were treated with trypsin-EDTA 
(0.05% trypsin, 0.5% EDTA) in PBS to release them from 
the plastic support. Well contents were agitated, and 

15 dilutions were plated on solid medium to yield the number 
of adherent bacteria per monolayer. Percent adherence 
was calculated by dividing the number of adherent cfu per 
monolayer by the number of inoculated cfu. 

As depicted in Table 1 below (the Tables appear at 

20 the end of the descriptive text) , this strain adhered 
quite efficiently, with nearly 90% of the inoculum 
binding to the monolayer. Adherence by the mutant 
expressing HMW1 but not HMW2 (HMW2") was also quite 
efficient and comparable to that by the wild type strain. 

25 In contrast, attachment by the strain expressing HMW2 but 
deficient in expression of HMW1 (HMW1*) was decreased 
about 15-fold relative to the wild type. Adherence by 
the double mutant (HMW1VHMW2") was decreased even 
further, approximately 50-fold compared with the wild 

30 type and approximately 3-fold compared with the HMW1 
mutant. Considered together, these results suggest that 
both the HMW1 protein and the, HMW2 protein influence 
attachment to Chang epithelial cells. Interestingly, 
optimal adherence to this cell line appears to require 

35 HMW1 but not HMW2 • 
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Example 4 ; 

Using the plasmids pHMWl-16 and pHMWl-17 (see 
Example 3) and following a scheme similar to that 
employed with strain 12 as described in Example 3, three 
5 non-typeable Haemophilus strain 5 mutants were isolated, 
including one with the kanamycin gene inserted into the 
hmwl -like (designated hmw3 ) locus, a second with an 
insertion in the hmw2 -like (designated fcmw4) locus, and 
a third with insertions in both loci • As predicted, 

10 Western immunoblot analysis demonstrated that the mutant 
with insertion of the kanamycin cassette into the hmwl - 
like locus had lost expression of the HMW3 125-kD 
protein, while the mutant with insertion into the hmw2- 
like locus failed to express the HMW4 123-kD protein. 

15 The mutant with a double insertion was unable to express 

either of the high molecular weight proteins. 

As shown in Table 1 below, wild type strain 5 
demonstrated high level adherence, with almost 80% of the 
inoculum adhering per monolayer. Adherence by the mutant 

20 deficient in expression of the HMW2-like protein was also 

quite high. In contrast, adherence by the mutant unable 
to express the, HMWl-like protein was reduced about 5- 
fold relative to the wild type, and attachment by the 
double mutant was diminished even further (approximately 

25 25-fold) * Examination of Giemsa-stained samples 

confirmed these observations (not shown) • Thus, the 
results with strain 5 corroborate the findings with 
strain 12 and the HMW1 and HMW2 proteins. 
Example 5 ; 

30 To confirm an adherence function for the HMW1 and 

HMW2 proteins and to examine the effect of HMW1 and HMW2 
independently of other H. influenzae surface structures, 
the hmwl and the hmw2 gene clusters were introduced into 
E. coli DH50E, using plasmids pHMWl-14 and pHMW2-21, 

35 respectively. As a control, the cloning vector, pT7-7, 
was also transformed into E. coli DH5a. Western blot 
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analysis demonstrated that E. coli DH5a containing the 
hmwl genes expressed a 125 kDa protein , while the same 
strain harboring the hmv2 genes expressed a 120-kDa 
protein. E. coli DH5a containing pT7-7 failed to react 
5 with antiserum against recombinant HMW1. Transmission 
electron microscopy revealed no pili or other surface 
appendages on any of the E. coli strains. 

Adherence by the E. coli strains was quant itated and 
compared with adherence by wild type non-typeable IL. 

10 influenzae strain 12. As shown in Table 2 below, 
adherence by E. coli DH5cr containing vector alone was 
less than 1% of that for strain 12, In contrast, E. coli 
DH5cr harboring the hmwl gene cluster demonstrated 
adherence levels comparable to those for strain 12. 

15 Adherence by E. coli DH5a containing the hmw2 genes was 
approximately 6-fold lower than attachment by strain 12 
but was increased 20- fold over adherence by E. coli DH5a 
with pT7-7 alone. These results indicate that the HMW1 
and HMW2 proteins are capable of independently mediating 

20 attachment to Chang conjunctival cells. These results 
are consistent with the results with the H. influenzae 
mutants reported in Examples 3 and 4, providing further 
evidence that, with Chang epithelial cells, HMW1 is a 
more efficient adhesin than is HMW2 • 

25 Experiments with E. coli HB101 harboring pT7-7, 

pHMWl-14, or pHMW2-21 confirmed the results obtained with 
the DH5a derivatives (see Table 2) • 
Example 6 ; 

HMW1 and HMW2 were isolated and purified from non- 
30 typeable H. influenzae (NTHI) strain 12 in the following 
manner. Non-typeable Haemophilus bacteria from frozen 
stock culture were streaked onto a chocolate plate and 
grown overnight at 37 *C in an incubator with 5% C0 2 . 
50ml starter culture of brain heart infusion (BHI) broth, 
35 supplemented with 10 ng/ml each of hemin and NAD was 
inoculated with growth on chocolate plate. The starter 
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culture was grown until the optical density (O.D. - 
600nm) reached 0.6 to 0.8 and then the bacteria in the 
starter culture was used to inoculate six 500 ml flasks 
of supplemented BHI using 8 to 10 ml per flask. The 
5 bacteria were grown in 500 ml flasks for an additional 5 
to 6 hours at which time the O.D. was 1.5 or greater. 
Cultures were centrifuged at 10,000 rpm for 10 minutes. 

Bacterial pellets were resuspended in a total volume 
of 250 ml of an extraction solution comprising 0.5 M 

10 NaCl, 0.01 M Na 2 EDTA, 0.01 M Tris 50 jiM 1,10- 

phenanthroline, pH 7.5. The cells were not sonicated or 
otherwise disrupted. The resuspended cells were allowed 
to sit on ice at 0*C for 60 minutes. The resuspended 
cells were centrifuged at 10,000 rpm for 10 minutes at 

15 4*C to remove the majority of intact cells and cellular 
debris. The supernatant was collected and centrifuged at 
100,000 xg for 60 minutes at 4*C. The supernatant again 
was collected and dialyzed overnight at 4°C against 0.01 
M sodium phosphate, pH 6.0. 

20 The sample was centrifuged at 10,000 rpm for 10 

minutes at 4*C to remove insoluble debris precipitated 
from solution during dialysis. The supernatant was 
applied to a 10 ml CM Sepharose column which has been 
pre-equilibrated with 0.01 M sodium phosphate, pH 6. 

25 Following application to this column, the column was 
washed with 0.01 M sodium phosphate. Proteins were 
elevated from the column with a 0 - 0.5M KC1 gradient in 
0.01 M Na phosphate, pH 6 and fractions were collected 
for gel examination. Coomassie gels of column fractions 

30 were carried out to identify those fractions containing 
high molecular weight proteins. The fractions containing 
high molecular weight proteins were pooled and 
concentrated to a 1 to 3 ml volume in preparation for 
application of sample to gel filtration column. 

35 A Sepharose CL-4B gel filtration column was 

equilibrated with phosphate-buffered saline, pH 7.5. The 
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concentrated high molecular weight protein sample was 
applied to the gel filtration column and column fractions 
were collected. Coomassie gels were performed on the 
column fractions to identify those containing high 
molecular weight proteins. The column fractions 
containing high molecular weight proteins were pooled. 

The proteins were tested to determine whether they 
would protect against experimental otitis media caused by 
the homologous strain. 

Chinchillas received three monthly subcutaneous 
injections with 40 /ig of an HMW1-HMW2 protein mixture in 
Freund's adjuvant. One month after the last injection, 
the animals were challenged by intrabullar inoculation 
with 300 cfu of NTHI strain 12. 

Infection developed in 5 of 5 control animals versus 
5 of 10 immunized animals. Among infected animals, 
geometric mean bacterial counts in middle ear fluid 7 
days post-challenge were 7.4 x 10 6 in control animals 
verus 1.3 x 10 5 in immunized animals. 

Serum antibody titres following immunization were 
comparable in uninfected and infected animals. However, 
infection in immunized animals was uniformly associated 
with the appearance of bacteria down-regulated in 
expression of the HMW proteins, suggesting bacterial 
selection in response to immunologic pressure. 

Although this data shows that protection following 
immunization was not complete, this data suggests the HMW 
adhesin proteins are potentially important protective 
antigens which may comprise one component of a multi- 
component NTHI vaccine. 

These animal challenge tests wererepeated in 
Chinchillas at a lower dose challenge than the 300 cfu 
employed above. In this instance, complete protection 
was achieved. In these experiments, groups of five 
animals were immunized with 20 pg of the HMW1-HMW2 



WO 94/21290 



PCT/US94/02550 



24 



10 



15 



mixture on days 1, 28, and 42 in the presence of A1P0 4 . 
Blood samples were collected on day 53 to monitor the 
antibody response. On day 56 , the left ear of animals 
was challenged with about 10 cfu of H. influenzae strain 
12. Ear infection was monitored on day 4. Four animals 
in Group 3 were infected previously by H. influenzae 
strain 12 and were recovered completely for at least one 
month before the second challenge. The results are 
outlined in the following Table A: 

TABLE A 

Protective ability of HMW protein against 
non-typeable H. influenzae challenge 
in chinchilla model 



20 



25 



30 



Group 


Antigens 


Total 
Animals 


Number of Animals Showed 
Positive Ear Infection 


(#) 






Tympano- 
gram 


Otosco- 

pic 
Examin- 
ation 


cfu of 
Bac- 
teria/ 
10 uL 


i 


HMW 


5 


0 


0 


0 


2 


None 


5 


5 


5 


850- 
3200 
(4/5) 


3 


Convalescent 


4 


0 


0 


0 



35 



Example 7 : 

A number of synthetic peptides were derived from 
HMW1. Antisera then was raised to these peptides. The 
anti-peptide antisera to peptide HMW1-P5 was shown to 
recognize HMW1. Peptide HMW1-P5 covers amino acids 1453 
to 1481 of HMW1, has the sequence 
VDEVI EAKRI LEKVKDLS DEEREALAKLG (SEQ ID NO: 9) , and 
represents bases 1498 to 1576 in Figure 10. 

This finding demonstrates that the DNA sequence and 
the derived protein is being interpreted in the correct 
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reading frame and that peptides derived from the sequence 
can be produced which will be immunogenic. 

SUMMARY OF DISCLOSURE 
In summary of this disclosure, the present invention 
5 provides high molecular weight proteins of non-typeable 
Haemophilus , genes coding for the same and vaccines 
incorporating such proteins. Modifications are possible 
within the scope of this invention. 
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Table 1*. Effect of mutation of high molecular weight 
proteins on adherence to Chang epithelial cells by 
nontypable H. influenzae. 



ADHERENCE" 



Strain 

Strain 12 derivatives 
wild type 
HMW1- mutant 
HMW2- mutant 
HMW1-/HMW2- mutant 



r 2 inoculum relative to wild tvnet 



87.7 ± 5.9 
6.0 ± 0.9 
89.9 + 10.8 
2.0 + 0.3 



100.0 ± 6.7 
6.8 + 1.0 
102.5 + 12.3 
2.3 ± 0.3 



Strain 5 derivatives 
wild type 

HMW1 -like mutant 
HMW2-like mutant 
double mutant 



78.7 ± 3.2 
15.7 ± 2.6 
103.7 + 14.0 
3.5 ± 0.6 



100.0 + 4.1 
19.9 ± 3.3 
131.7 + 17.8 
4.4 + 0.8 



* Numbers represent mean (+. standard error of the mean) of 
measurements in triplicate or quadruplicate from representative 
experiments. 

~ Adherence values for strain 12 derivatives are relative to strain 12 
wild type; values for strain 5 derivatives are relative to strain 5 wild 
type. 
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Table 2. Adherence by E. coli DH5a and HB101 harboring 
hmwl or hmw2 gene clusters. 



Strain* 

DH5a (pT7-7) 
DH5a (pHMWl-i4) 
DH5a (pHMW2-21) 



Adherence relative to 

2L influenzae strain 12t 

0.7 ± 0.02 
114.2 ± 15.9 
14.0 ± 3.7 



HB101 (pT7-7) 
HB101 (pHMWl-14) 
HB101 (pHMW2-21) 



1.2 ± 0.5 
93.6 ± 15.8 
3.6 ± 0.9 



The plasmid pHMWl-14 contains the hmwl gene cluster, while 
pHMW2-21 contains the Amw2 gene cluster: pT7-7 is the cloning 
vector used in these constructs. 

* Numbers represent the mean (+ standard error of the mean) of 
measurements made in triplicate from representative experiments. 
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SEQUENCE LISTING 
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(i) APPLICANT: BARENKAMP, STEPHEN J 

ST. GEME III, JOSEPH W 

(ii) TITLE OF INVENTION: HIGH MOLECULAR WEIGHT SURFACE PROTEINS 
OF NON- TYPEABLE HAEMOPHILUS 

(iii) NUMBER OF SEQUENCES: 8 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Shoemaker and Mattare, Ltd 

(B) STREET: 2001 Jefferson Davis Hwy. , 1203 Crystal Plaza 

Bldg. 1 

(C) CITY: Arlington 

(D) STATE: Virginia 

(E) COUNTRY: U.S.A. 

(F) ZIP: 22202-0286 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS /MS-DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1.25 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: US 08/038,682 

(B) FILING DATE: 16-MAR-1993 

(C) CLASSIFICATION: 

(viii) ATTORNEY/ AGENT INFORMATION: 

(A) NAME: BERKS TRESSER , JERRY W 

(B) REGISTRATION NUMBER: 22,651 

(C) REFERENCE /DOCKET NUMBER: 1038-2 93 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: (703) 415-0810 

(B) TELEFAX: (703) 415-0813 



(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 5116 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 
ACAGCGTTCT CTTAATACTA GTACAAACCC ACAATAAAAT ATGACAAACA ACAATTACAA 
CACCTTTTTT GCAGTCTATA TGCAAATATT TTAAAAAATA GTATAAATCC GCCATATAAA 
ATGGTATAAT CTTTCATCTT TCATCTTTCA TCTTTCATCT TTCATCTTTC ATCTTTCATC 
TTTCATCTTT CATCTTTCAT CTTTCATCTT TCATCTTTCA TCTTTCATCT TTCATCTTTC 
ACATGCCCTG ATGAACCGAG GGAAGGGAGG GAGGGGCAAG AATGAAGAGG GAGCTGAACG 



SUBSTITUTE SHEET (RULE 26) 
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AACGCAAATG 


ATAAAGTAAT 


TTAATTGTTC 


AACTAACCTT 


AGGAGAAAAT 


ATGAACAAGC 


360 


TATATCGTCT 


CAAATTCAGC 


AAACGCCTGA 


ATGCTTTGGT 


TGCTGTGTCT 


GAATTGGCAC 


420 


GGGGTTGTGA 


CCATTCCACA 


GAAAAAGGCA 


GCGAAAAACC 


TGCTCGCATG 


AAAGTGCGTC 


480 


ACTTAGCGTT 


AAAGCCACTT 


TCCGCTATGT 


TACTATCTTT 


AGGTGTAACA 


TCTATTCCAC 


540 


AATCTGTTTT 


AGCAAGCGGC 


TTACAAGGAA 


TGGATGTAGT 


ACACGGCACA 


GCCACTATGC 


600 


AAGTAGATGG 


TAATAAAACC 


ATTATCCGCA 


ACAGTGTTGA 


CGATATCATT 


AATTGGAAAC 


660 


AATTTAACAT 


CGACCAAAAT 


GAAATGGTGC 


AGTTTTTACA 


AGAAAACAAC 


AACTCCGCCG 


720 


TATTCAACCG 


TGTTACATCT 


AACCAAATCT 


CCCAATTAAA 


AGGGATTTTA 


GATTCTAACG 


780 


GACAAGTCTT 


TTTAATCAAC 


CCAAATGGTA 


TCACAATAGG 


TAAAGACGCA 


ATTATTAACA 


840 


CTAATGG CTT 


TACGGCTTCT 


ACGCTAGACA 


TTTCTAACGA 


AAACATCAAG 


GCGCGTAATT 


900 


TCACCTTCGA 


GCAAACCAAA 


GATAAAGC6C 


TCGCTGAAAT 


TGTGAATCAC 


GGTTTAATTA 


960 


CTGTCGGTAA 


AGACGGCAGT 


GTAAATCTTA 


TTGGTGGCAA 


AGTGAAAAAC 


GAGGGTGTGA 


1020 


TTAGCGTAAA 


TGGTGGCAGC 


ATTTCTTTAC 


TCGCAGGGCA 


AAAAATCACC 


ATCAGCGATA 


1080 


TAATAAACCC 


AAC CAT TACT 


TACAGCATTG 


CCGCGCCTGA 


AAATGAAGCG 


GTCAATCTGG 


1140 


GCGATATTTT 


TGCCAAAGGC 


GGTAACATTA 


ATGTCCGTGC 


TGCCACTATT 


CGAAACCAAG 


1200 


GTAAACTTTC 


TGCTGATTCT 


GTAAG CAAAG 


ATAAAAGCGG 


CAATATTGTT 


CTTTCCGCCA 


1260 


AAGAGGGTGA 


AG CGG AAATT 


GGCGGTGTAA 


TTTCCGCTCA 


AAATCAGCAA 


GCTAAAGGCG 


1320 


GCAAGCTGAT 


GATTACAGGC 


GATAAAGTCA 


CATTAAAAAC 


AGGTGCAGTT 


ATCGAC C TTT 

wwnw a a A 


1380 


CAGGTAAAGA 


AGGGGGAGAA 


ACTTACCTTG 


GCGGTGACGA 


GCGCGGCGAA 


GGTAAAAAGG 


1440 


GCATTCAATT 


AGCAAAGAAA 


ACCTCTTTAG 


AAAAAGGCTC 


AACCATCAAT 


GTATCAGGCA 


1500 


AAGAAAAAGG 


CGGACGCGCT 


ATTGTGTGGG 


GCGATATTGC 


GTTAATTGAC 


GGCAATATTA 


1560 


ACGCTCAAGG 


TAGTGGTGAT 


ATCGCTAAAA 


CCGGTGGTTT 


TGTGGAGACG 


TCGGGGCATG 


1620 


ATTTATTCAT 


CAAAGACAAT 


GCAATTGTTG 


ACGCCAAAGA 


GTGGTTGTTA 


GACCCGGATA 


1680 


ATGTATCTAT 


TAATGCAGAA 


ACAGCAGGAC 


GCAGCAATAC 


TTCAGAAGAC 


GATGAATACA 


1740 


CGGGATCCGG 


GAATAGTGCC 


AGCACCCCAA 


AACGAAACAA 


AGAAAAGACA 


ACATTAACAA 


1800 


ACACAACTCT 


TGAGAGTATA 


CTAAAAAAAG 


GTACCTTTGT 


TAACATCACT 


GCTAATCAAC 


1860 


G CATCTATGT 


CAATAGCTCC 


ATTAATTTAT 


CCAATGGCAG 


CTTAACTCTT 


TGGAGTGAGG 


1920 


GTCGGAGCGG 


TGGCGGCGTT 


GAGATTAACA 


ACGATATTAC 


CACCGGTGAT 


GATACCAGAG 


1980 


GTG CAAACTT 


AACAATTTAC 


TCAGGCGGCT 


GGGTTGATGT 


TCATAAAAAT 


ATCTCACTCG 


2040 


GGGCGCAAGG 


TAACATAAAC 


ATTACAGCTA 


AACAAGATAT 


CGCCTTTGAG 


AAAGGAAGCA 


2100 


ACCAAGTCAT 


TACAGGTCAA 


GGGACTATTA 


CCTCAGGCAA 


TCAAAAAGGT 


TTTAGATTTA 


2160 


ATAATGTCTC 


TCTAAACGGC 


ACTGGCAGCG 


GACTGCAATT 


CACCACTAAA 


AGAACCAATA 


2220 


AATACGCTAT 


CACAAATAAA 


TTTGAAGGGA 


CTTTAAATAT 


TTCAGGGAAA 


GTGAACATCT 


2280 


CAATGGTTTT 


ACCTAAAAAT 


GAAAGTGGAT 


ATGATAAATT 


CAAAGGACGC 


ACTTACTGGA 


2340 
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ATTTAACCTC CTTAAATGTT TCCGAGAGTG GCGAGTTTAA CCTCACTATT GACT CCAGAG 
GAAGCGATAG TGCAGGCACA CTTACCCAGC CTTATAATTT AAACGGTATA TCATTCAACA 
AAGACACTAC CTTTAATGTT GAACGAAATG CAAGAGTCAA CTTTGACATC AAGGCACCAA 
TAG GGATAAA TAAGTATTCT AGTTTGAATT ACGCATCATT TAATGGAAAC ATTTCAGTTT 
CGGGAGGGGG GAGTGTTGAT TTCACACTTC TCGCCTCATC CTCTAACGTC CAAACCCCCG 
GTGTAGTTAT AAATTCTAAA TACTTTAATG TTTCAACAGG GTCAAGTTTA AGATTTAAAA 
CTTCAGGCTC AACAAAAACT GGCTTCTCAA TAGAGAAAGA TTTAACTTTA AATGCCACCG 
GAGGCAACAT AACACTTTTG CAAGTTGAAG GCACCGATGG AATGATTGGT AAAGGCATTG 
TAGCCAAAAA AAACATAACC TTTGAAGGAG GTAACATCAC CTTTGGCTCC AGGAAAGCCG 
TAACAGAAAT CGAAGG CAAT GTTACTATCA ATAACAACGC TAACGTCACT CTTATCGGTT 
CGGATTTTGA CAACCATCAA AAACCTTTAA CTATTAAAAA AGATGTCATC ATTAATAGCG 
GCAACCTTAC CGCTGGAGGC AATATTGTCA ATATAGCCGG AAATCTTACC GTTGAAAGTA 
ACGCTAATTT CAAAGCTATC ACAAATTTCA CTTTTAATGT AGGCGGCTTG TTTGACAACA 
AAGGCAATTC AAATATTTCC ATTGCCAAAG GAGGGG CTCG CTTTAAAGAC ATTGATAATT 
CCAAGAATTT AAGCATCACC ACCAACTCCA GCTCCACTTA CCGCACTATT ATAAGCGGCA 
ATATAACCAA TAAAAACGGT GATTTAAATA TTACGAACGA AGGTAGTGAT ACTGAAATGC 
AAATTGGCGG CGATGTCTCG CAAAAAGAAG GTAATCTCAC GATTTCTTCT GACAAAATCA 
ATATTACCAA ACAGATAACA ATCAAGGCAG GTGTTGATGG GGAGAATTCC GATTCAGACG 
CGACAAACAA TG C CAATCTA ACCATTAAAA CCAAAGAATT GAAATTAACG CAAGACCTAA 
ATATTTCAGG TTTCAATAAA GCAGAGATTA CAGCTAAAGA TGGTAGTGAT TTAACTATTG 
GTAACACCAA TAGTGCTGAT GGTACTAATG CCAAAAAAGT AACCTTTAAC CAGGTTAAAG 
ATTCAAAAAT CTCTGCTGAC GGTCACAAGG TGACACTACA CAGCAAAGTG GAAACATCCG 
GTAGTAATAA CAACACTGAA GATAGCAGTG ACAATAATGC CGGCTTAACT ATCGATGCAA 
AAAATGTAAC AGTAAACAAC AATATTACTT CTCACAAAGC AGTGAGCATC TCTGCGACAA 
GTGGAGAAAT TACCACTAAA ACAGGTACAA CCATTAACGC AACCACTGGT AACGTGGAGA 
TAACCGCTCA AACAGGTAGT ATCCTAGGTG GAATTGAGTC CAGCTCTGGC TCTG TAACAC 
TTACTGCAAC CGAGGGCGCT CTTGCTGTAA GCAATATTTC GGGCAACACC GTTACTGTTA 
CTGCAAATAG CGGTGCATTA AC CACTTTGG CAGGCTCTAC AATTAAAGGA ACCGAGAGTG 
TAACCACTTC AAGTCAATCA GGCGATATCG GCGGTACGAT TTCTGGTGGC ACAGTAGAGG 
TTAAAGCAAC CGAAAGTTTA ACCACTCAAT CCAATTCAAA AATTAAAGCA ACAACAGGCG 
AGGCTAACGT AACAAGTGCA ACAGGTACAA TTGGTGGTAC GATTTCCGGT AATACGGTAA 
ATGTTACGGC AAACGCTGGC GATTTAACAG TTGGGAATGG CGCAGAAATT AATGCGACAG 
AAGGAGCTGC AACCTTAACT ACATCATCGG GCAAATTAAC TACCGAAGCT AGTTCACACA 
TTACTTCAGC CAAGGGTCAG GTAAATCTTT CAGCTCAGGA TGGTAGCGTT GCAGGAAGTA 
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3000 

3060 

3120 

3180 

3240 

3300 

3360 

3420 
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TTAATGCCGC 


CAATGTGACA 


CTAAATACTA 


CAGGCACTTT 


AACTACCGTG 


AAGGGTTCAA 


4440 


ACATTAATGC 


AACCAGCGGT 


ACCTTGGTTA 


TTAACGCAAA 


AGACGCTGAG 


CTAAATGGCG 


4500 


CAGCATTGGG 


TAACCACACA 


GTGGTAAATG 


CAACCAACGC 


AAATGGCTCC 


GGCAGCGTAA 


4560 


TCGCGACAAC 


CTCAAGCAGA 


GTGAACATCA 


CTGGGGATTT 


AATCACAATA 


AATGGATTAA 


4620 


ATATCATTTC 


AAAAAACGGT 


ATAAACACCG 


TACTGTTAAA 


AGGCGTTAAA 


ATTGATGTGA 


4680 


AATACATTCA 


ACCGGGTATA 


GCAAGCGTAG 


ATGAAGTAAT 


TGAAGCGAAA 


CGCATCCTTG 


4740 


AGAAGGTAAA 


AGATTTATCT 


GATGAAGAAA 


GAGAAGCGTT 


AGCTAAACTT 


GGAGTAAGTG 


4800 


CTGTACGTTT 


TATTGAGCCA 


AATAATACAA 


TTACAGTCGA 


TACACAAAAT 


GAATTTGCAA 


4860 


CCAGACCATT 


AAGTCGAATA 






X X L X 


A i\ f a f2T^ a Tr 

*nxiV—>io x ui-ix 


A QO ft 

4 u 


GCGCGACGGT 


GTGCGTTAAT 


ATCGCTGATA 


ACGGGCGGTA 


GCGGTCAGTA 


ATTGACAAGG 


4980 


TAGATTTCAT 


CCTGCAATGA 


AGTCATTTTA 


TTTTCGTATT 


ATTTACTGTG 


TGGGTTAAAG 


5040 


TTCAGTACGG 


GCTTTACCCA 


TCTTGTAAAA 


AATTACGGAG 


AATACAATAA 


AGTATTTTTA 


5100 


ACAGGTTATT 


ATTATG 










5116 



(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1536 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

Met Asn Lys lie Tyr Arg Leu Lys Phe Ser Lys Arg Leu Asn Ala Leu 
1 5 10 15 

Val Ala Val Ser Glu Leu Ala Arg Gly Cys Asp His Ser Thr Glu Lys 
20 25 30 

Gly Ser Glu Lys Pro Ala Arg Met Lys Val Arg His Leu Ala Leu Lys 
35 40 45 

Pro Leu Ser Ala Met Leu Leu Ser Leu Gly Val Thr Ser lie Pro Gin 
50 55 60 

Ser Val Leu Ala Ser Gly Leu Gin Gly Met Asp Val Val His Gly Thr 
65 70 75 80 

Ala Thr Met Gin Val Asp Gly Asn Lys Thr lie lie Arg Asn Ser Val 
85 90 95 

Asp Ala lie lie Asn Trp Lys Gin Phe Asn lie Asp Gin Asn Glu Met 
100 105 110 

Val Gin Phe Leu Gin Glu Asn Asn Asn Ser Ala Val Phe Asn Arg Val 
115 120 125 

Thr Ser Asn Gin lie Ser Gin Leu Lys Gly lie Leu Asp Ser Asn Gly 
130 135 140 
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Gin Val Phe Leu lie Asn Pro Asn Gly lie Thr lie Gly Lys Asp Ala 
145 150 155 160 

lie lie Asn Thr Asn Gly Phe Thr Ala Ser Thr Leu Asp He Ser Asn 
165 170 175 

Glu Asn He Lys Ala Arg Asn Phe Thr Phe Glu Gin Thr Lys Asp Lys 
180 185 190 

Ala Leu Ala Glu He Val Asn His Gly Leu He Thr Val Gly Lys Asp 
195 200 205 

Gly Ser Val Asn Leu He Gly Gly Lys Val Lys Asn Glu Gly Val He 
210 215 220 

Ser Val Asn Gly Gly Ser He Ser Leu Leu Ala Gly Gin Lys lie Thr 
225 230 235 240 

He Ser Asp He He Asn Pro Thr He Thr Tyr Ser He Ala Ala Pro 
245 250 255 

Glu Asn Glu Ala Val Asn Leu Gly Asp He Phe Ala Lys Gly Gly Asn 
260 265 270 

He Asn Val Arg Ala Ala Thr He Arg Asn Gin Gly Lys Leu Ser Ala 
275 280 285 

Asp Ser Val Ser Lys Asp Lys Ser Gly Asn He Val Leu Ser Ala Lys 
290 295 300 

Glu Gly Glu Ala Glu He Gly Gly Val He Ser Ala Gin Asn Gin Gin 
305 310 315 320 

Ala Lys Gly Gly Lys Leu Met He Thr Gly Asp Lys Val Thr Leu Lys 
325 330 335 

Thr Gly Ala Val He Asp Leu Ser Gly Lys Glu Gly Gly Glu Thr Tyr 
340 345 350 

Leu Gly Gly Asp Glu Arg Gly Glu Gly Lys Asn Gly He Gin Leu Ala 
355 360 365 

Lys Lys Thr Ser Leu Glu Lys Gly Ser Thr He Asn Val Ser Gly Lys 
370 375 380 

Glu Lys Gly Gly Arg Ala He Val Trp Gly Asp He Ala Leu He Asp 
385 390 395 400 

Gly Asn He Asn Ala Gin Gly Ser Gly Asp He Ala Lys Thr Gly Gly 
405 410 415 

Phe Val Glu Thr Ser Gly His Asp Leu Phe He Lys Asp Asn Ala He 
420 425 430 

Val Asp Ala Lys Glu Trp Leu Leu Asp Phe Asp Asn Val Ser He Asn 
435 440 445 

Ala Glu Thr Ala Gly Arg Ser Asn Thr Ser Glu Asp Asp Glu Tyr Thr 
450 455 460 

Gly Ser Gly Asn Ser Ala Ser Thr Pro Lys Arg Asn Lys Glu Lys Thr 
465 470 475 480 

Thr Leu Thr Asn Thr Thr Leu Glu Ser He Leu Lys Lys Gly Thr Phe 



485 



490 



495 
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Val Asn lie Thr Ala Asn Gin Arg lie Tyr Val Asn Ser Ser lie Asn 
500 505 510 

Leu Ser Asn Gly Ser Leu Thr Leu Trp Ser Glu Gly Arg Ser Gly Gly 
515 520 525 

Gly Val Glu lie Asn Asn Asp lie Thr Thr Gly Asp Asp Thr Arg Gly 
530 535 540 

Ala Asn Leu Thr lie Tyr Ser Gly Gly Trp Val Asp Val His Lys Asn 
545 550 555 560 

lie Ser Leu Gly Ala Gin Gly Asn lie Asn He Thr Ala Lys Gin Asp 
565 570 575 

He Ala Phe Glu Lys Gly Ser Asn Gin Val He Thr Gly Gin Gly Thr 
580 585 590 

He Thr Ser Gly Asn Gin Lys Gly Phe Arg Phe Asn Asn Val Ser Leu 
595 600 605 

Asn Gly Thr Gly Ser Gly Leu Gin Phe Thr Thr Lys Arg Thr Asn Lys 
610 615 620 

Tyr Ala He Thr Asn Lys Phe Glu Gly Thr Leu Asn He Ser Gly Lys 
625 630 635 640 

Val Asn He Ser Met Val Leu Pro Lys Asn Glu Ser Gly Tyr Asp Lys 
645 650 655 

Phe Lys Gly Arg Thr Tyr Trp Asn Leu Thr Ser Leu Asn Val Ser Glu 
660 665 670 

Ser Gly Glu Phe Asn Leu Thr He Asp Ser Arg Gly Ser Asp Ser Ala 
675 680 685 

Gly Thr Leu Thr Gin Pro Tyr Asn Leu Asn Gly He Ser Phe Asn Lys 
690 695 700 

Asp Thr Thr Phe Asn Val Glu Arg Asn Ala Arg Val Asn Phei Asp He 
705 710 715 720 

Lys Ala Pro He Gly He Asn Lys Tyr Ser Ser Leu Asn Tyr Ala Ser 
725 730 735 

Phe Asn Gly Asn He Ser Val Ser Gly Gly Gly Ser Val Asp Phe Thr 
740 745 750 

Leu Leu Ala Ser Ser Ser Asn Val Gin Thr Pro Gly Val Val He Asn 
755 760 765 

Ser Lys Tyr Phe Asn Val Ser Thr Gly Ser Ser Leu Arg Phe Lys Thr 
770 775 780 

Ser Gly Ser Thr Lys Thr Gly Phe Ser He Glu Lys Asp Leu Thr Leu 
785 790 795 800 

Asn Ala Thr Gly Gly Asn He Thr Leu Leu Gin Val Glu Gly Thr Asp 
805 810 815 

Gly Met He Gly Lys Gly He Val Ala Lys Lys Asn He Thr Phe Glu 
820 825 830 

Gly Gly Asn He Thr Phe Gly Ser Arg Lys Ala Val Thr Glu He Glu 
835 840 845 
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Gly Asn Val Thr lie Asn Asn Asn Ala Asn Val Thr Leu lie Gly Ser 
850 855 860 

Asp Phe Asp Asn His Gin Lys Pro Leu Thr lie Lys Lys Asp Val lie 
865 870 875 880 

lie Asn Ser Gly Asn Leu Thr Ala Gly Gly Asn He Val , Asn He Ala 
885 890 895 

Gly Asn Leu Thr Val Glu Ser Asn Ala Asn Phe Lys Ala He Thr Asn 
900 905 910 

Phe Thr Phe Asn Val Gly Gly Leu Phe Asp Asn Lys Gly Asn Ser Asn 
915 920 925 

He Ser He Ala Lys Gly Gly Ala Arg Phe Lys Asp He Asp Asn Ser 
930 935 940 

Lys Asn Leu Ser He Thr Thr Asn Ser Ser Ser Thr Tyr Arg Thr He 
945 950 955 960 

He Ser Gly Asn He Thr Asn Lys Asn Gly Asp Leu Asn He Thr Asn 
965 970 975 

Glu Gly Ser Asp Thr Glu Met Gin He Gly Gly Asp Val Ser Gin Lys 
980 985 990 

Glu Gly Asn Leu Thr He Ser Ser Asp Lys He Asn He Thr Lys Gin 
995 1000 1005 

He Thr He Lys Ala Gly Val Asp Gly Glu Asn Ser Asp Ser Asp Ala 
1010 1015 1020 

Thr Asn Asn Ala Asn Leu Thr He Lys Thr Lys Glu Leu Lys Leu Thr 
1025 1030 1035 1040 

Gin Asp Leu Asn He Ser Gly Phe Asn Lys Ala Glu He Thr Ala Lys 
1045 1050 1055 

Asp Gly Ser Asp Leu Thr He Gly Asn Thr Asn Ser Ala Asp Gly Thr 
1060 1065 1070 

Asn Ala Lys Lys Val Thr Phe Asn Gin Val Lys Asp Ser Lys He Ser 
1075 1080 1085 

Ala Asp Gly His Lys Val Thr Leu His Ser Lys Val Glu Thr Ser Gly 
1090 1095 1100 

Ser Asn Asn Asn Thr Glu Asp Ser Ser Asp Asn Asn Ala Gly Leu Thr 
1105 1110 1115 1120 

He Asp Ala Lys Asn Val Thr Val Asn Asn Asn He Thr Ser His Lys 
1125 1130 1135 

Ala Val Ser He Ser Ala Thr Ser Gly Glu He Thr Thr Lys Thr Gly 
1140 1145 1150 

Thr Thr He Asn Ala Thr Thr Gly Asn Val Glu He Thr Ala Gin Thr 
1155 1160 1165 

Gly Ser He Leu Gly Gly He Glu Ser Ser Ser Gly Ser Val Thr Leu 
1170 1175 1180 

Thr Ala Thr Glu Gly Ala Leu Ala Val Ser Asn He Ser Gly Asn Thr 
1185 1190 1195 1200 
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Val Thr Val Thr Ala Asn Ser Gly Ala Leu Thr Thr Leu Ala Gly Ser 
1205 1210 1215 

Thr lie Lys Gly Thr Glu Ser Val Thr Thr Ser Ser Gin Ser Gly Asp 
1220 1225 1230 

lie Gly Gly Thr lie Ser Gly Gly Thr Val Glu Val Lys Ala Thr Glu 
1235 1240 1245 

Ser Leu Thr Thr Gin Ser Asn Ser Lys lie Lys Ala Thr Thr Gly Glu 
1250 1255 1260 

Ala Asn Val Thr Ser Ala Thr Gly Thr lie Gly Gly Thr lie Ser Gly 
1265 1270 1275 128< 

Asn Thr Val Asn Val Thr Ala Asn Ala Gly Asp Leu Thr Val Gly Asn 
1285 1290 1295 

Gly Ala Glu He Asn Ala Thr Glu Gly Ala Ala Thr Leu Thr Thr Ser 
1300 1305 1310 

Ser Gly Lys Leu Thr Thr Glu Ala Ser Ser His He Thr Ser Ala Lys 
1315 1320 1325 

Gly Gin Val Asn Leu Ser Ala Gin Asp Gly Ser Val Ala Gly Ser He 
1330 1335 1340 

Asn Ala Ala Asn Val Thr Leu Asn Thr Thr Gly Thr Leu Thr Thr Val 
1345 1350 1355 1361 

Lys Gly Ser Asn lie Asn Ala Thr Ser Gly Thr Leu Val He Asn Ala 
1365 1370 1375 

Lys Asp Ala Glu Leu Asn Gly Ala Ala Leu Gly Asn His Thr Val Val 
1380 1385 1390 

Asn Ala Thr Asn Ala Asn Gly Ser Gly Ser Val He Ala Thr Thr Ser 
1395 1400 1405 

Ser Arg Val Asn lie Thr Gly Asp Leu He Thr He Asn Gly Leu Asn 
1410 1415 1420 

He He Ser Lys Asn Gly He Asn Thr Val Leu Leu Lys Gly Val Lys 
1425 1430 1435 1441 

He Asp Val Lys Tyr He Gin Pro Gly He Ala Ser Val Asp Glu Val 
1445 1450 1455 

He Glu Ala Lys Arg He Leu Glu Lys Val Lys Asp Leu Ser Asp Glu 
1460 1465 1470 

Glu Arg Glu Ala Leu Ala Lys Leu Gly Val Ser Ala Val Arg Phe He 
1475 1480 1485 

Glu Pro Asn Asn Thr He Thr Val Asp Thr Gin Asn Glu Phe Ala Thr 
1490 1495 1500 

Arg Pro Leu Ser Arg He Val He Ser Glu Gly Arg Ala Cys Phe Ser 
1505 1510 1515 152< 

Asn Ser Asp Gly Ala Thr Val Cys Val Asn He Ala Asp Asn Gly Arg 



1525 
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(2) INFORMATION FOR SEQ ID NO : 3 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 937 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 
TAAATATACA AGATAATAAA AATAAATCAA GATTTTTGTG ATGACAAACA ACAATTACAA 
CACCTTTTTT GCAGTCTATA TGCAAATATT TTAAAAAAAT AGTATAAATC CGCCATATAA 
AATGGTATAA TCTTTCATCT TTCATCTTTA ATCTTTCATC TTTCATCTTT CATCTTTCAT 
CTTTCATCTT TCATCTTTCA TCTTTCATCT TTCATCTTTC ATCTTTCATC TTTCATCTTT 
CACATGAAAT GATGAACCGA GGGAAGGGAG GGAGGGGCAA GAATGAAGAG GGAGCTGAAC 
GAACGCAAAT GATAAAGTAA TTTAATTGTT CAACTAACCT TAGGAGAAAA TATGAACAAG 
ATATATCGTC TCAAATTCAG CAAACGCCTG AATGCTTTGG TTGCTGTGTC TGAATTGGCA 
CGGGGTTGTG ACCATTCCAC AGAAAAAGGC TTCCGCTATG TTACTATCTT TAGGTGTAAC 
CACTTAGCGT TAAAGCCACT TTCCGCTATG TTACTATCTT TAGGTGTAAC ATCTATTCCA 
CAATCTGTTT TAGCAAG CGG CTTACAAGGA ATGGATGTAG TACACGGCAC AGCCACTATG 
CAAGTAGATG GTAATAAAAC CATTATCCGC AACAGTGTTG ACGCTATCAT TAATTGGAAA 
CAATTTAACA TCGACCAAAA TGAAATGGTG CAGTTTTTAC AAGAAAACAA CAACTCCGCC 
GTATTCAACC GTGTTACATC TAACCAAATC TCCCAATTAA AAGGGATTTT AGATTCTAAC 
GGACAAGTCT TTTTAATCAA CCCAAATGGT ATCACAATAG GTAAAGACGC AATTATTAAC 
ACTAATGGCT TTACGGCTTC TACGCTAGAC ATTTCTAACG AAAACATCAA GGCGCGTAAT 
TTCACCTTCG AGCAAACCAA AGATAAAGCG CTCG CTGAAA TTGTGAATCA CGGTTTAATT 
ACTGTCGGTA AAGACGG CAG TGTAAATCTT ATTGGTGGCA AAGTGAAAAA CGAGGGTGTG 
ATTAGCGTAA ATGGTGG CAG CATTTCTTTA CTCG CAGGGC AAAAAATCAC CATCAGCGAT 
ATAATAAACC CAACCATTAC TTACAGCATT GCCGCGCCTG AAAATGAAGC GGTCAATCTG 
GGCGATATTT TTGCCAAAGG CGGTAACATT AATGTCCGTG CTGCCACTAT TCGAAACCAA 
GGTAAACTTT CTGCTGATTC TGTAAGCAAA GATAAAAGCG GCAATATTGT TCTTTCCGCC 
AAAGAGGGTG AAGCGGAAAT TGGCGGTGTA ATTTCCGCTC AAAATCAGCA AGCTAAAGGC 
GGCAAGCTGA TGATTACAGG CGATAAAGTC ACATTAAAAA CAGGTG CAGT TATCGACCTT 
TCAGGTAAAG AAGGGGGAGA AACTTACCTT GGCGGTGACG AGCGCGGCGA AGGTAAAAAC 
GGCATTCAAT TAGCAAAGAA AACCTCTTTA GAAAAAGGCT CAACCATCAA TGTATCAGGC 
AAAGAAAAAG GCGGACGCGC TATTGTGTGG GGCGATATTG CGTTAATTGA CGGCAATATT 
AACGCTCAAG GTAGTGGTGA TATCG CTAAA ACCGGTGGTT TTGTGGAGAC ATCGGGGCAT 
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TATTTATCCA TTGACAGCAA TGCAATTGTT AAAACAAAAG AGTGGTTGCT AG AC CCTGAT 
GATGTAACAA TTGAAGCCGA AGACCCCCTT CGCAATAATA CCGGTATAAA TGATGAATTC 
CCAACAGGCA CCGGTGAAGC AAGCGACCCT AAAAAAAATA GCGAACTCAA AACAACGCTA 
ACCAATACAA CTATTTCAAA TTATCTGAAA AACG CCTGGA CAATGAATAT AACGGCATCA 
AGAAAACTTA CCGTTAATAG CTCAATCAAC ATCGGAAGCA ACTCCCACTT AATTCTCCAT 
AGTAAAGGTC AGCGTGGCGG AGGCGTTCAG ATTGATGGAG ATATTACTTC TAAAGGCGGA 
AATTTAACCA TTTATTCTGG CGGATGGGTT GATGTTCATA AAAATATTAC GCTTGATCAG 
GGTTTTTTAA ATATTACCGC CGCTTCCGTA GCTTTTGAAG GTGGAAATAA CAAAGCACGC 
GACGCGGCAA ATGCTAAAAT TGTCGCCCAG GGCACTGTAA CCATTACAGG AGAGGGAAAA 
GATTTCAGGG CTAACAACGT ATCTTTAAAC GGAACGGGTA AAGGTCTGAA TATCATTTCA 
TCAGTGAATA ATTTAACCCA CAATCTTAGT GGCACAATTA ACATATCTGG GAATATAACA 
ATTAACCAAA CTACGAGAAA GAACACCTCG TATTGGCAAA CCAGCCATGA TTCGCACTGG 
AACGTCAGTG CTCTTAATCT AG AGACAGG C GCAAATTTTA CCTTTATTAA ATACATTTCA 
AGCAATAGCA AAGGCTTAAC AACACAGTAT AGAAGCTCTG CAGGGGTGAA TTTTAACGGC 
GTAAATGGCA ACATGTCATT CAATCTCAAA GAAGGAGCGA AAGTTAATTT CAAATTAAAA 
CCAAACGAGA ACATGAACAC AAGCAAACCT TTACCAATTC GGTTTTTAGC CAATATCACA 
GCCACTGGTG GGGGCTCTGT TTTTTTTGAT ATATATGCCA ACCATTCTGG CAGAGGGGCT 
GAGTTAAAAA TGAGTGAAAT TAATATCTCT AACGGCG CTA ATTTTACCTT AAATTCCCAT 
GTTCGCGGCG ATGACGCTTT TAAAATCAAC AAAGACTTAA CCATAAATGC AACCAATTCA 
AATTTCAGCC TCAGACAGAC GAAAGATGAT TTTTATGACG GGTACGCACG CAATGC CAT C 
AATTCAACCT ACAACATATC CATTCTGGGC GGTAATGTCA CCCTTGGTGG ACAAAACTCA 
AGCAGCAGCA TTACGGGGAA TATTACTATC GAGAAAGCAG CAAATGTTAC GCTAGAAGCC 
AATAACGCCC CTAATCAGCA AAACATAAGG GATAGAGTTA TAAAACTTGG CAGCTTG CTC 
GTTAATGGGA GTTTAAGTTT AACTGGCGAA AATGCAGATA TTAAAGGCAA TCTCACTATT 
TCAGAAAGCG CCACTTTTAA AGGAAAGACT AGAGATACCC TAAATATCAC CGGCAATTTT 
ACCAATAATG GCACTGCCGA AATTAATATA ACACAAGGAG TGGTAAAACT TGGCAATGTT 
ACCAATGATG GTGATTTAAA CATTACCACT CACGCTAAAC GCAACCAAAG AAGCATCATC 
GGCGGAGATA TAATCAACAA AAAAGGAAGC TTAAATATTA CAGACAGTAA TAATGATGCT 
GAAATCCAAA TTGGCGGCAA TATCTCGCAA AAAGAAGGCA ACCTCACGAT TTCTTCCGAT 
AAAATTAATA TCACCAAACA GATAACAATC AAAAAGGGTA TTGATGGAGA GGACTCTAGT 
TCAGATGCGA CAAGTAATGC CAACCTAACT ATTAAAACCA AAGAATTGAA ATTGACAGAA 
GACCTAAGTA TTTCAGGTTT CAATAAAGCA GAGATTACAG CCAAAGATGG TAGAGATTTA 
ACTATTGGCA ACAGTAATGA CGGTAACAGC GGTGCCGAAG CCAAAACAGT AACTTTTAAC 
AATGTTAAAG ATTCAAAAAT CTCTGCTGAC GGTCACAATG TGACACTAAA TAGCAAAGTG 



1680 
1740 
1800 
1860 
1920 
1980 
2040 
2100 
2160 
2220 
2280 
2340 
2400 
2460 
2520 
2580 
2640 
2700 
2760 
2820 
2880 
2940 
3000 
3060 
3120 
3180 
3240 
3300 
3360 
3420 
3480 
3540 
3600 
3660 




BNSDOCID: <WO 9421290A1_L> 



WO 94/21290 



PCT/US94/02550 



38 



AAAACATCTA GCAGCAATGG CGGACGTGAA AGCAATAGCG ACAACGATAC CGGCTTAACT 
ATTACTGCAA AAAATGTAGA AGTAAACAAA GATATTACTT CTCTCAAAAC AGTAAATATC 
ACCGCGTCGG AAAAGGTTAC CACCACAGCA GGCTCGACCA TTAACGCAAC AAATGGCAAA 
G C AAGTATTA CAACCAAAAC AGGTGATATC AGCGGTACGA TTTCCGGTAA CACGGTAAGT 
GTTAGCGCGA CTGGTGATTT AACCACTAAA TCCGGCTCAA AAATTGAAGC GAAATCGGGT 
GAGGCTAATG TAACAAGTGC AACAGGTACA ATTGGCGGTA CAATTTCCGG TAATACGGTA 
AATGTTACGG CAAACG CTGG CGATTTAACA GTTGGGAATG GCGCAGAAAT TAATGCGACA 
GAAGGAGCTG CAACCTTAAC CGCAACAGGG AATACCTTGA CTACTGAAGC CGGTTCTAGC 
ATCACTTCAA CTAAGGGTCA GGTAGACCTC TTGGCTCAGA ATGGTAGCAT CGCAGGAAGC 
ATTAATG CTG CTAATGTGAC ATTAAATACT ACAGGCACCT TAACCACCGT GGCAGGCTCG 
GATATTAAAG CAACCAGCGG CACCTTGGTT ATTAACGCAA AAGATGCTAA GCTAAATGGT 
GATGCATCAG GTGATAGTAC AGAAGTGAAT GCAGTCAACG CAAGCGGCTC TGGTAGTGTG 
ACTGCGGCAA CCTCAAGCAG TGTGAATATC ACTGGGGATT TAAACACAGT AAATGGGTTA 
AATATCATTT CGAAAGATGG TAGAAACACT GTGCGCTTAA GAGGCAAGGA AATTGAGGTG 
AAATATATCC AGCCAGGTGT AG CAAGTGTA GAAGAAGTAA TTGAAGCGAA ACGCGTCCTT 
GAAAAAGTAA AAGATTTATC TGATGAAGAA AGAGAAACAT TAGCTAAACT TGGTGTAAGT 
GCTGTACGTT TTGTTGAGCC AAATAATACA ATTACAGTCA ATACACAAAA TGAATTTACA 
ACCAGACCGT CAAGTCAAGT GATAATTTCT GAAGGTAAGG CGTGTTTCTC AAGTGGTAAT 
GGCGCACGAG TATGTACCAA TGTTGCTGAC GATGGACAGC CGTAGTCAGT AATTGACAAG 
GTAGATTTCA TCCTGCAATG AAGTCATTTT ATTTTCGTAT TATTTACTGT GTGGGTTAAA 
GTTCAGTACG GGCTTTACCC ATCTTGTAAA AAATTACGGA GAATACAATA AAGTATTTTT 
AACAGGTTAT TATTATG 

(2) INFORMATION FOR SEQ ID NO:4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 77 amino acids 

(B) TYPE : amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:4: 

Met Asn Lys He Tyr Arg Leu Lys Phe Ser Lys Arg Leu Asn Ala Leu 



Val Ala Val Ser Glu Leu Ala Arg Gly Cys Asp His Ser Thr Glu Lvs 
20 25 30 

Gly Ser Glu Lys Pro Ala Arg Met Lys Val Arg His Leu Ala Leu Lys 



3720 

3780 

3840 

3900 

3960 

4020 

4080 

4140 

4200 

4260 

4320 

4380 

4440 

4500 

4560 

4620 

4680 

4740 

4800 

4860 

4920 

4937 
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Pro Leu Ser Ala Met Leu Leu Ser Leu Gly Val Thr Ser lie Pro Gin 
50 55 60 

Ser Val Leu Ala Ser Gly Leu Gin Gly Met Asp Val Val His Gly Thr 
65 70 75 80 

Ala Thr Met Gin Val Asp Gly Asn Lys Thr lie He Arg Asn Ser Val 
85 90 95 

Asp Ala lie He Asn Trp Lys Gin Phe Asn He Asp Gin Asn Glu Met 
100 105 110 

Val Gin Phe Leu Gin Glu Asn Asn Asn Ser Ala Val Phe Asn Arg Val 
115 120 125 

Thr Ser Asn Gin He Ser Gin Leu Lys Gly He Leu Asp Ser Asn Gly 
130 135 140 

Gin Val Phe Leu He Asn Pro Asn Gly He Thr He Gly Lys Asp Ala 
145 150 155 160 

He He Asn Thr Asn Gly Phe Thr Ala Ser Thr Leu Asp He Ser Asn 
165 170 175 

Glu Asn He Lys Ala Arg Asn Phe Thr Phe Glu Gin Thr Lys Asp Lys 
180 185 190 

Ala Leu Ala Glu He Val Asn His Gly Leu He Thr Val Gly Lys Asp 
195 200 205 

Gly Ser Val Asn Leu He Gly Gly Lys Val Lys Asn Glu Gly Val He 
210 215 220 

Ser Val Asn Gly Gly Ser He Ser Leu Leu Ala Gly Gin Lys He Thr 
225 230 235 240 

He Ser Asp He He Asn Pro Thr He Thr Tyr Ser He Ala Ala Pro 
245 250 255 

Glu Asn Glu Ala Val Asn Leu Gly Asp He Phe Ala Lys Gly Gly Asn 
260 265 270 

He Asn Val Arg Ala Ala Thr He Arg Asn Gin Gly Lys Leu Ser Ala 
275 280 285 

Asp Ser Val Ser Lys Asp Lys Ser Gly Asn He Val Leu Ser Ala Lys 
290 295 300 

Glu Gly Glu Ala Glu He Gly Gly Val He Ser Ala Gin Asn Gin Gin 
305 310 315 320 

Ala Lys Gly Gly Lys Leu Met He Thr Gly Asp Lys Val Thr Leu Lys 
325 330 335 

Thr Gly Ala Val He Asp Leu Ser Gly Lys Glu Gly Gly Glu Thr Tyr 
340 345 350 

Leu Gly Gly Asp Glu Arg Gly Glu Gly Lys Asn Gly He Gin Leu Ala 
355 360 365 

Lys Lys Thr Ser Leu Glu Lys Gly Ser Thr He Asn Val Ser Gly Lys 
370 375 380 

Glu Lys Gly Gly Phe Ala He Val Trp Gly Asp He Ala Leu He Asp 



385 



390 



395 



400 
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Gly Asn He Asn Ala Gin Gly Ser Gly Asp He Ala Lys Thr Gly Gly 
405 410 415 

Phe Val Glu Thr Ser Gly His Asp Leu Phe He Lys Asp Asn Ala He 
420 425 430 

Val Asp Ala Lys Glu Trp Leu Leu Asp Phe Asp Asn Val Ser He Asn 
435 440 445 

Ala Glu Asp Pro Leu Phe Asn Asn Thr Gly He Asn Asp Glu Phe Pro 
450 455 460 

Thr Gly Thr Gly Glu Ala Ser Asp Pro Lys Lys Asn Ser Glu Leu Lys 
465 470 475 480 

Thr Thr Leu Thr Asn Thr Thr He Ser Asn Tyr Leu Lys Asn Ala Trp 
485 490 495 

Thr Met Asn He Thr Ala Ser Arg Lys Leu Thr Val Asn Ser Ser He 
500 505 510 

Asn He Gly Ser Asn Ser His Leu He Leu His Ser Lys Gly Gin Arg 
515 520 525 

Gly Gly Gly Val Gin He Asp Gly Asp He Thr Ser Lys Gly Gly Asn 
530 535 540 

Leu Thr He Tyr Ser Gly Gly Trp Val Asp Val His Lys Asn He Thr 
545 550 555 560 

Leu Asp Gin Gly Phe Leu Asn He Thr Ala Ala Ser Val Ala Phe Glu 
565 570 575 

Gly Gly Asn Asn Lys Ala Arg Asp Ala Ala Asn Ala Lys He Val Ala 
580 585 590 

Gin Gly Thr Val Thr He Thr Gly Glu Gly Lys Asp Phe Arg Ala Asn 
595 600 605 

Asn Val Ser Leu Asn Gly Thr Gly Lys Gly Leu Asn He He Ser Ser 
610 615 620 

Val Asn Asn Leu Thr His Asn Leu Ser Gly Thr He Asn He Ser Gly 
625 630 635 640 

Asn He Thr He Asn Gin Thr Thr Arg Lys Asn Thr Ser Tyr Trp Gin 
645 650 655 

Thr Ser His Asp Ser His Trp Asn Val Ser Ala Leu Asn Leu Glu Thr 
660 665 670 

Gly Ala Asn Phe Thr Phe He Lys Tyr He Ser Ser Asn Ser Lys Gly 
675 680 685 

Leu Thr Thr Gin Tyr Arg Ser Ser Ala Gly Val Asn Phe Asn Gly Val 
690 695 700 

Asn Gly Asn Met Ser Phe Asn Leu Lys Glu Gly Ala Lys Val Asn Phe 
70S 710 715 720 

Lys Leu Lys Pro Asn Glu Asn Met Asn Thr Ser Lys Pro Leu Pro He 
725 730 735 

Arg Phe Leu Ala Asn He Thr Ala Thr Gly Gly Gly Ser Val Phe Phe 
740 745 750 
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Asp lie Tyr Ala Asn His Ser Gly Arg Gly Ala Glu Leu Lys Met Ser 
755 760 765 

Glu lie Asn lie Ser Asn Gly Ala Asn Phe Thr Leu Asn Ser His Val 
770 775 780 

Arg Gly Asp Asp Ala Phe Lys lie Asn Lys Asp Leu Thr lie Asn Ala 
785 790 795 800 

Thr Asn Ser Asn Phe Ser Leu Arg Gin Thr Lys Asp Asp Phe Tyr Asp 
805 810 815 

Gly Tyr Ala Arg Asn Ala lie Asn Ser Thr Tyr Asn lie Ser lie Leu 
820 825 830 

Gly Gly Asn Val Thr Leu Gly Gly Gin Asn Ser Ser Ser Ser lie Thr 
835 840 845 

Gly Asn lie Thr lie Glu Lys Ala Ala Asn Val Thr Leu Glu Ala Asn 
850 855 860 

Asn Ala Pro Asn Gin Gin Asn lie Arg Asp Arg Val lie Lys Leu Gly 
865 870 875 880 

Ser Leu Leu Val Asn Gly Ser Leu Ser Leu Thr Gly Glu Asn Ala Asp 
885 890 895 

lie Lys Gly Asn Leu Thr lie Ser Glu Ser Ala Thr Phe Lys Gly Lys 
900 905 910 

Thr Arg Asp Thr Leu Asn lie Thr Gly Asn Phe Thr Asn Asn Gly Thr 
915 920 925 

Ala Glu lie Asn lie Thr Gin Gly Val Val Lys Leu Gly Asn Val Thr 
930 935 940 

Asn Asp Gly Asp Leu Asn lie Thr Thr His Ala Lys Arg Asn Gin Arg 
945 950 955 960 

Ser lie lie Gly Gly Asp lie lie Asn Lys Lys Gly Ser Leu Asn lie 
965 970 975 

Thr Asp Ser Asn Asn Asp Ala Glu lie Gin lie Gly Gly Asn lie Ser 
980 985 990 

Gin Lys Glu Gly Asn Leu Thr lie Ser Ser Asp Lys He Asn He Thr 
995 1000 1005 

Lys Gin He Thr He Lys Lys Gly He Asp Gly Glu Asp Ser Ser Ser 
1010 1015 1020 

Asp Ala Thr Ser Asn Ala Asn Leu Thr He Lys Thr Lys Glu Leu Lys 
1025 1030 1035 104C 

Leu Thr Glu Asp Leu Ser He Ser Gly Phe Asn Lys Ala Glu He Thr 
1045 1050 1055 

Ala Lys Asp Gly Arg Asp Leu Thr He Gly Asn Ser Asn Asp Gly Asn 
1060 1065 1070 

Ser Gly Ala Glu Ala Lys Thr Val Thr Phe Asn Asn Val Lys Asp Ser 
1075 1080 1085 

Lys He Ser Ala Asp Gly His Asn Val Thr Leu Asn Ser Lys Val Lys 
1090 1095 1100 
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Thr Ser Ser Ser Asn Gly Gly Arg Glu Ser Asn Ser Asp Asn Asp Thr 
l:L °5 1110 ins K 1120 

Gly Leu Thr He Thr Ala Lys Asn Val Glu Val Asn Lys Asp He Thr 
H25 H30 H35 

Ser Leu Lys Thr Val Asn He Thr Ala Ser Glu Lys Val Thr Thr Thr 
H40 H45 1150 

Ala Gly Ser Thr He Asn Ala Thr Asn Gly Lys Ala Ser He Thr Thr 
H55 H60 H65 

Lys Thr Gly Asp He Ser Gly Thr He Ser Gly Asn Thr Val Ser Val 
H70 H75 H80 

Ser Ala Thr Val Asp Leu Thr Thr Lys Ser Gly Ser Lys He Glu Ala 
1185 1190 H95 1200 

Lys Ser Gly Glu Ala Asn Val Thr Ser Ala Thr Gly Thr He Gly Gly 
1205 1210 1215 

Thr He Ser Gly Asn Thr Val Asn Val Thr Ala Asn Ala Gly Asp Leu 
1220 1225 1230 

Thr Val Gly Asn Gly Ala Glu He Asn Ala Thr Glu Gly Ala Ala Thr 
1235 1240 1245 

Leu Thr Ala Thr Gly Asn Thr Leu Thr Thr Glu Ala Gly Ser Ser He 
1250 1255 1260 

Thr Ser Thr Lys Gly Gin Val Asp Leu Leu Ala Gin Asn Gly Ser He 
1265 1270 1275 1280 

Ala Gly Ser He Asn Ala Ala Asn Val Thr Leu Asn Thr Thr Gly Thr 
1285 1290 1295 

Leu Thr Thr Val Ala Gly Ser Asp He Lys Ala Thr Ser Gly Thr Leu 
1300 1305 1310 

Val He Asn Ala Lys Asp Ala Lys Leu Asn Gly Asp Ala Ser Gly Asp 
1315 1320 1325 

Ser Thr Glu Val Asn Ala Val Asn Ala Ser Gly Ser Gly Ser Val Thr 
1330 1335 1340 

Ala Ala Thr Ser Ser Ser Val Asn He Thr Gly Asp Leu Asn Thr Val 
1345 "SO 1355 1360 

Asn Gly Leu Asn He He Ser Lys Asp Gly Arg Asn Thr Val Arg Leu 
1365 1370 1375 

Arg Gly Lys Glu He Glu Val Lys Tyr He Gin Pro Gly Val Ala Ser 
1380 1385 1390 

Val Glu Glu Val He Glu Ala Lys Arg Val Leu Glu Lys Val Lys Asp 
1395 1400 1405 

Leu Ser Asp Glu Glu Arg Glu Thr Leu Ala Lys Leu Gly Val Ser Ala 
1410 1415 142 o 

Val Arg Phe Val Glu Pro Asn Asn Thr He Thr Val Asn Thr Gin Asn 
142 $ 1430 1435 i 4 40 

Glu Phe Thr Thr Arg Pro Ser Ser Gin Val He He Ser Glu Gly Lys 
1445 1450 1455 
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Ala Cys Phe Ser Ser Gly Asn Gly Ala Arg Val Cys Thr Asn Val Ala 
1460 1465 1470 

Asp Asp Gly Gin Pro 
1475 

(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 9171 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
<D) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:5: 



ACAGCGTTCT 


CTTAATACTA 


GTACAAACCC 


ACAATAAAAT 


7\ m/** 7V O TV TV TV ^ TV 

ATGACAAACA 


ACAATTACAA 


60 


CACCTTTTTT 


GCAGTCTATA 


TGCAAATATT 


TTAAAAAATA 


GTATAAATCC 


GCCATATAAA 


120 


ATGGTATAAT 


CTTTCATCTT 


TCATCTTTCA 


TCTTTCATCT 


TTCATCTTTC 


ATCTTTCATC 


180 


TTTCATCTTT 


CATCTTTCAT 


CTTTCATCTT 


TCATCTTTCA 


TCTTTCATCT 


TTCATCTTTC 


240 


ACATGAAATG 


ATGAACCGAG 


GGAAGGGAGG 


GAGGGGCAAG 


AATGAAGAGG 


GAGCTGAACG 


300 


AACGCAAATG 


ATAAAGTAAT 


TTAATTGTTC 


AACTAACCTT 


AGGAGAAAAT 


ATGAACAAGA 


360 


TATATCGTCT 


CAAATTCAGC 


AAACGCCTGA 


ATGCTTTGGT 


TGCTGTGTCT 


GAATTGGCAC 


420 


GGGGTTGTGA 


CCATTCCACA 


GAAAAAGGCA 


GCGAAAAACC 


TGCTCGCATG 


AAAGTGCGTC 


480 


ACTTAGCGTT 


AAAGCCACTT 


TCCGCTATGT 


TACTATCTTT 


AGGTGTAACA 


TCTATTCCAC 


540 


AATCTGTTTT 


AGCAAGCGGC 


TTACAAGGAA 


TGGATGTAGT 


ACACGGCACA 


GCCACTATGC 


600 


AAGTAGATGG 


TAATAAAACC 


ATTATCCGCA 


ACAGTGTTGA 


CGCTATCATT 


AATTGGAAAC 


660 


AATTTAACAT 


CGACCAAAAT 


GAAATGGTGC 


AGTTTTTACA 


AGAAAACAAC 


AACTCCGCCG 


720 


TATTCAACCG 


TGTTACATCT 


AACCAAATCT 


CCCAATTAAA 


AGGGATTTTA 


GATTCTAACG 


780 


GACAAGTCTT 


TTTAATCAAC 


CCAAATGGTA 


TCACAATAGG 


TAAAGACGCA 


ATTATTAACA 


840 


CTAATGG CTT 


TACGGCTTCT 


ACGCTAGACA 


TTTCTAACGA 


AAACATCAAG 


GCGCGTAATT 


900 


TCACCTTCGA 


GCAAACCAAA 


GATAAAG CGC 


TCGCTGAAAT 


TGTGAATCAC 


GGTTTAATTA 


960 


CTGTCGGTAA 


AGACGGCAGT 


GTAAATCTTA 


TTGGTGGCAA 


AGTGAAAAAC 


GAGGGTGTGA 


1020 


TTAGCGTAAA 


TGGTGGCAGC 


ATTTCTTTAC 


TCGCAGGGCA 


AAAAATCACC 


ATCAGCGATA 


1080 


TAATAAACCC 


AACCATTACT 


TACAGCATTG 


CCGCGCCTGA 


AAATGAAGCG 


GTCAATCTGG 


1140 


GCGATATTTT 


TGCCAAAGGC 


GGTAACATTA 


ATGTCCGTGC 


TGCCACTATT 


CGAAACCAAG 


1200 


CTTTCCGCCA 


AAGAGGGTGA 


AGCGGAAATT 


GGCGGTGTAA 


TTTCCGCTCA 


AAATCAGCAA 


1260 


GCTAAAGGCG 


GCAAGCTGAT 


GATTACAGGC 


GATAAAGTCA 


CATTAAAAAC 


AGGTGCAGTT 


1320 


ATCGACCTTT 


CAGGTAAAGA 


AGGGGGAGAA 


ACTTACCTTG 


GCGGTGACGA 


GCGCGGCGAA 


1380 


GGTAAAAACG 


GCATTCAATT 


AGCAAA.GAAA 


ACCTCTTTAG 


AAAAAGGCTC 


AAC CAT CAAT 


1440 
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GTATCAGGCA 


AAGAAAAAGG 


CGGACGCGCT 


' ATTGTGTGGG 


GCGATATTGC 


1 GTTAATTGAC 


1500 


GGCAATATTA 


ACGCTCAAGG 


TAGTGGTGAT 


ATCGCTAAAA 


CCGGTGGTTT 


1 TGTGGAGACG 


1560 


TCGGGGCATG 


ATTTATTCAT 


CAAAGACAAT 


GCAATTGTTG 


ACG CCAAAG A 


. GTGGTTGTTA 


1620 


GACCCGGATA 


ATG TATCTAT 


TAATG CAGAA 


ACAGCAGGAC 


GCAGCAATAC 


TTCAGAAGAC 


1680 


GATGAATACA 


CGGGATCCGG 


GAATAGTGCC 


AGCACCCCAA 


AACGAAACAA 


AGAAAAGACA 


1740 


ACATTAACAA 


ACACAACTCT 


TGAGAGTATA 


CTAAAAAAAG 


GTACCTTTGT 


TAACATCACT 


1800 


GCTAATCAAC 


GCATCTATGT 


CAATAGCTCC 


ATTAATTTAT 


CCAATGGCAG 


CTTAACTCTT 


I860 


TGGAGTGAGG 


GTCGGAGCGG 


TGGCGGCGTT 


GAGATTAACA 


ACGATATTAC 


CACCGGTGAT 


1920 


GATACCAGAG 


GTGCAAACTT 


AACAATTTAC 


TCAGGCGGCT 


GGGTTGATGT 


TCATAAAAAT 


1980 


ATCTCACTCG 


GGGCGCAAGG 


TAACATAAAC 


ATTACAGCTA 


AACAAGATAT 


CGCCTTTGAG 


2040 


AAAGGAAGCA 


ACCAAGTCAT 


TACAGGTCAA 


GGGACTATTA 


CCTCAGGCAA 


TCAAAAAGGT 


2100 


TTTAGATTTA 


ATAATGTCTC 


TCTAAACGGC 


ACTGGCAGCG 


GACTGCAATT 


CACCACTAAA 


2160 


AGAACCAATA 


AATACGCTAT 


CACAAATAAA 


TTTGAAGGGA 


CTTTAAATAT 


TTCAGGGAAA 


2220 


GTGAACATCT 


CAATGGTTTT 


AC CTAAAAAT 


GAAAGTGGAT 


ATGATAAATT 


CAAAGGACGC 


2280 


ACTTACTGGA 


ATTTAACCTC 


GAAAGTGGAT 


ATGATAAATT 


CAAAGGACGC 


CCTCACTATT 


2340 


GACTCCAGAG 


GAAGCGATAG 


TGCAGGCACA 


CTTACCCAGC 


CTTATAATTT 


AAACGGTATA 




TCATTCAACA 


AAGACACTAC 


CTTTAATGTT 


GAACGAAATG 


CAAGAGTCAA 


CTTTGACATC 


2460 


AAGGCACCAA 


TAGGGATAAA 


TAAGTATTCT 


AGTTTGAATT 


ACGCATCATT 


TAATGGAAAC 


2520 


ATTTCAGTTT 


CGGGAGGGGG 


GAGTGTTGAT 


TTCACACTTC 


TCGCCTCATC 


CTCTAACGTC 


2580 


CAAACCCCCG 


GTGTAGTTAT 


AAATTCTAAA 


TACTTTAATG 


TTTCAACAGG 


GTCAAGTTTA 


264 0 


AGATTTAAAA 


CTTCAGGCTC 


AACAAAAACT 


GGCTTCTCAA 


TAGAGAAAGA 


TTTAACTTTA 


2700 


AATGCCACCG 


GAGGCAACAT 


AACACTTTTG 


CAAGTTGAAG 


GCACCGATGG 


AATGATTGGT 


2 760 


AAAGGCATTG 


TAGCCAAAAA 


AAACATAACC 


TTTGAAGGAG 


GTAAGATGAG 


GTTTGGCTCC 


2820 


AGGAAAGCCG 


TAACAGAAAT 


CGAAGGCAAT 


GTTACTATCA 


ATAACAACGC 


TAACGTCACT 


m o o yj 


CTTATCGGTT 


CGGATTTTGA 


CAACCATCAA 


AAACCTTTAA 


CTATTAAAAA 


AGATGTCATC 


2 940 


ATTAATAGCG 


GCAACCTTAC 


CGCTGGAGGC 


AATATTGTCA 


ATATAGCCGG 


AAATCTTAC C 


•J \J \J \J 


GTTGAAAGTA 


ACGCTAATTT 


CAAAGCTATC 


ACAAATTTCA 


CTTTTAATGT 


AGGCGGCTTG 


3060 


TTTGACAACA 


AAGGCAATTC 


AAATATTTCC 


ATTGCCAAAG 


GAGGGGCTCG 


CTTTAAAGAC 


3120 


ATTGATAATT 


CCAAGAATTT 


AAGCATCACC 


ACCAACTCCA 


GCTCCACTTA 


CCGCACTATT 


3180 






t a z\ zv 2v zi canT 


r°* y\ f|Mf iim yi » » (11 TV 

vaAi 1 1/iAAxA 




AGGTAGTGAT 


3240 


ACTGAAATGC 


AAATTGGCGG 


CGATGTCTCG 


CAAAAAGAAG 


GTAATCTCAC 


GATTTCTTCT 


3300 


GACAAAATCA 


ATATTACCAA 


ACAGATAACA 


ATCAAGGCAG 


GTGTTGATGG 


GGAGAATTCC 


3360 


GATTCAGACG 


CGACAAACAA 


TGCCAATCTA 


ACCATTAAAA 


CCAAAGAATT 


GAAATTAACG 


3420 


CAAGACCTAA 


ATATTTCAGG 


TTTCAATAAA 


GCAGAGATTA 


CAGCTAAAGA 


TGGTAGTGAT 


3460 
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TT A A PT A TTfi 


GTAAPAPPAA 

VJ x nnv«nVywtn 


TAGTGPTGAT 


GGTAPTAATG 


PPAAAZilVZlGT 


A A PPTTTTA TA P 


J jtU 


PAGGTTAAAG 


ATTPAAAAAT 


PT PTG PTG AP 


GGTPAPAAGG 

vjvj x V— _nv— nnw w 


TGAPAPTAPA 


PAGP A A Af2Tf2 




GAAACATCCG 


GTAGTAATAA 


CAACACTGAA 


GATAGCAGTG 


ACAATAATGP 


CGGPTTAAPT 




ATPGATGPAA 


AAAATGTAAP 


AGTAAAPAAP 


AATATTAPTT 

X ^1 X X nv< X X 


PTPAPAAAGP 

v«. x v^nwvlriVJv. 


AGTGAGPATP 
*ib X OMbv-Al v_ 


j / ^ U 


TPTGPGAPAA 


GTGGAGAAAT 


TA PP A PTA A A 


7A.P7Af2f2TTAPTA.TA 


PP ATT A A PGP 


A A PP TA PTf2PT 


J/80 


A APGTGG AGA 

An\»U X OUnun 


TAAPPGPTPA 
liinv.v.uu X V_n 


A A P A GGT A GT 
rinL/iVJU X n\J X 


tv ppp» tv f2f3TG 
n 1 v,Ll riVjo X v3 


fZ AATTGAGTP 
Onri X X uAb X v» 


PAGPTPT/2PP 
Uiuv, 1L1 V3\aC 




TPTGTAAPAP 

l^lUl nn V— n w 


TTAPTGPAAP 


PGZXGGGPGPT 


^ x x x \j x nn 


GP A AT ATTTP 

Uvy/vii/iJi X IV, 


GGGPA APAPP 




GTTACTGTTA 


CTGCAAATAG 


CGGTG CATTA 


ACCACTTTGG 


CAGGCTCTAC 


AATTAAAGGA 




ACCGAGAGTG 


TAACCACTTC 


AAGTCAATCA 


GG CGATATCG 


G CGGTACGAT 

w www *#iw>wn a 


TTPTGGTGGP 

X X v> X X \3^7V» 




APAGTAGAGG 


TTA A AGP A AP 

X X nnnV3\vnnV*. 


PG A A AGTTT A 
V^OnnnO XXX n 


APPAPTPAAT 
nv« v^iv X wtft X 


PPAATTPAAA 
\»^nnX X wvvi 


AATTAAAGPA 
nnX Xnnn\3V«n 






ta p.pptta tv ppt 
AbbL lAALb 1 


TV TV^TATAr^T/^l/^TV 

AAWiAb 1 o^_-A 


TA f* TA fif^T TA P TA TA 
AuAbb X ALAA 


1 IbblbblAv* 


bAl 1 lLLbbl 


4140 


AATACGGTAA 


ATGTTACGG C 


Tv tv tv r*r* t »i i \r*^ 
AAACG CTGG C 


GATTT AA CAG 


ITPr'PH TV *p^*o 

TTGGGAATGG 


TV ^1 TV TV TV m^n 

CGCAGAAATT 


4200 


7v tv *r*r* r*r* tv r~* tv 
AATG CGACAG 


TV TV TV 

AAGGAG CTG C 


AAC CTTAACT 


TV ^ t\ rr<i^ tv t^^^^t 

A CAT CAT CGG 


^*^TV TV TV M WT»TV TV ^ 

GCAAATTAAC 


m TV TV TV j"m ■ 

TACCGAAGCT 


4260 


AG 11 CAUiCA 


TTACTT CAG C 


vUAAGG G 1 CJAG 


GTAAATCTTT 


CAG Cx CAGGA 


TGGT AG CGTT 


4320 


p p tv p p tv tv p tta 


1 lAAlbLLbL 


/-i TV TATV^T^TV ^"TV 

LAAlblGAUA 


t "l*TV TV TN ^PTA ^*rpTV 

1AAA1AL 1 A 


CAGGCACTTT 


TV TV PTTTA ^"»/^/*^WP 

AAC TACcGTG 


4380 


TV 7V /"^ »| 1 TV TV 


ACATTAATG C 


TV TV /"»#*■» TV »»f i 

AALUAbLbbl 


ACCTTGGTTA 


TTTtTiPPPTi TA TA 

I X AACGCAAA 


TA P TV PP P»TV^ TV 

AGACG CTGAG 


4440 


pta a atp.p.pp 


P TA P P TA TT/!/V 


T TA TA T\ r* TA O TA 


\S 1 1 nnnX O 


P TA TA r*r* TA TA 


TV TV TA TPPPTPP 

AAAlbbL x CC 


45 OO 


ouwivsLb 1 AA 


tpppptv pn tv p 

1 CGCGACAAC 


L. TCAAb UAGA 


O TP TV TV r* TA TP TA 

G I bAAUAl CA 


CTGGGGATTT 


TV TV TP TV PTV TV *PTV 

AATCACAATA 


4560 


7A TV T/"2 O TA T'P 7A Tv 


ATATCATTTC 


ta tv tv tv tv tv c*rur**T* 
AAAAAALbbi 


TV T* TV TV TV fhPPP 


lAwlbl Innn 


TAPPPP'P'PTV TV TV 

AbbLbl I AAA 


a c o r\ 


74 nim/"* tv n*P'P/ * TV 

Al 1GATGTGA 


TV TV TV P TV 'I'T'OTV 

AATACATT CA 


ACCGGGTATA 


G CAAG CGT AG 


TV TP TV TV nm* TV FT* 

ATGAAGTAAT 


•IV TV TV i~*r*r* TV TV TV 

TG AAG CG AAA 


4680 


CG CATC CTTG 


TV /"I TV TV /"^ /~»rr>7\ TV TV 

AGAAGGTAAA 


AGATTTATCT 


7\ rri/ p, t TV TV ^ TV TV TV 

GATGAAGAAA 


GAGAAG CGTT 


AGCTAAACTT 


4740 


GG CGTAAGTG 


CTGTACGTTT 


TATTGAGCCA 


tv tv fnit tv mit ^^tv tv 

AATAATACAA 


mm tv ^^tv ^^tn^^^tTV 

TTACAGT CGA 


TACACAAAAT 


4800 


^ tv tv nwTwnn w «l 

GAATTTGCAA 


^i^»tv ^«tv ^»^itv ^n^n 

C CAG AC CAT T 


TV T^ ^% m TV TV TV 

AAGTCGAATA 


GTGATTTCTG 


tv tv ^^^t^ tv ^»^» 

AAGGCAGGGC 


GTGTTTCTCA 


4860 


TV TV ^TVV'P^TV FP/** 

AACAGTGATG 


GCGCGACGGT 


/^»P^*^<*^'t»'l»TV TV FP 

GTGCGTTAAT 


TV. T»^^ ^'P^ TV »T^7V 

ATCG CTG ATA 


tv f^r^t ^mtv 
ACGGGCGGTA 


^^^/ *l|I^TV FIT TV 

GCGGTCAGTA 


4920 


TV MMIIf TV TV TV f*f^ 

ATTGACAAGG 


TAGATTTCAT 


C C TG GAATG A 


AG TCATTTT A 


TTTTCGTATT 


ATTTACTGTG 


4980 


m^i^tfiwif MTV TV TV 

TGGGTTAAAG 


rprp/-*Tv /^t»t\ r^r* r+ 

TTCAGTACGG 


GCTTTACCCA 


TCTTGTAAAA 


TV TV f IIIIITV ^t^»P TV P 

AATTACGGAG 


TV TV CP TV TV TV FTITV TV 

AATACAATAA 


5040 


AGTATTTTTA 


ACAGGTTATT 


ATTATGAAAA 


TV rn TV FP TV TV TV TV TV 

ATATAAAAAG 


n 7\ tv 1 1 n 1 1 TV TV TV TV 

CAGATTAAAA 


CTCAG TG CAA 


5100 


TAT CAGTATT 


G CTTGG C CTG 


GCTTCTTCAT 


PTVfprp^tPPTV fp/^^ 

CATTGTATG C 


TV TV TV^TVTVP PP 

AG AAG AAG CG 


T'lTiXAGTAA 


5X60 


tv appp'P'P'pptv 
AAuuL ill CA 


GX lAlLlbbl 


r* ^ TA HTTP TV TA TV 


CI I 1 AAb JL G A 


TA n TA PP PPP TA TA 


PTPTPTPT TA P 
LlblLl blAb 


522 O 


CAAAATCTTT 


ATCTAAATAC 


CAAGGCTCGC 


AAACTTTAAC 


AAACCTAAAA 


ACAGCACAGC 


5280 


TTGAATTACA 


GG CTGTGCTA 


GATAAGATTG 


AG C CAAAT AA 


GTTTGATGTG 


ATATTGCCAC 


5340 


AACAAACCAT 


TACGGATGGC 


AATATTATGT 


TTGAGCTAGT 


CTCGAAATCA 


GCCGCAGAAA 


5400 


GCCAAGTTTT 


TTATAAGGCG 


AGCCAGGGTT 


ATAGTGAAGA 


AAATATCGCT 


CGTAGCCTGC 


5460 


CATCTTTGAA 


ACAAGGAAAA 


GTGTATGAAG 


ATGGTCGTCA 


GTGGTTCGAT 


TTGCGTGAAT 


5520 
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TCAATATGGC 


AAAAGAAAAT 


CCACTTAAAG 


TCACTCGCGT 


GCATTACGAG 


TTAAACCCTA 


5560 


AAAACAAAAC 


CTCTGATTTG 


GTAGTTGCAG 


GTTTTTCGCC 


TTTTGGCAAA 


ACGCGTAGCT 


5640 


TTGTTTCCTA 


TGATAATTTC 


GGCGCAAGGG 


AGTTTAACTA 


TCAACGTGTA 


AGTCTAGGTT 


5700 


TTGTAAATGC 


CAATTTGACC 


GGACATGATG 


ATGTATTAAA 


TCTAAACGCA 


TTGACCAATG 


5760 


TAAAAGCACC 


ATCAAAATCT 


TATGCGGTAG 


G CAT AGG AT A 


TACTTATCCG 


TTTTATGATA 


5820 


AACACCAATC 


CTTAAGTCTT 


TATACCAGCA 


TGAGTTATGC 


TGATTCTAAT 


GATATCGACG 


5880 


GCTTACCAAG 


TGCGATTAAT 


CGTAAATTAT 


CAAAAGGTCA 


ATCTATCTCT 


GCGAATCTGA 


5940 


AATGGAGTTA 


TTATCTCCCG 


ACATTTAACC 


TTGGAATGGA 


AGACCAGTTT 


AAAATTAATT 


6000 


TAGGCTACAA 


CTACCGCCAT 


ATTAATCAAA 


CATCCGAGTT 


AAACACCCTG 


GGTGCAACGA 


6060 


AGAAAAAATT 


TGCAGTATCA 


GGCGTAAGTG 


CAGGCATTGA 


TGGACATATC 


CAATTTACCC 


6120 


CTAAAACAAT 


CTTTAATATT 


GATTTAACTC 


ATCATTATTA 


CGCGAGTAAA 


TTAC CAGGCT 


6180 


CTTTTGGAAT 


GGAGCGCATT 


GGCGAAACAT 


TTAATCGCAG 


CTATCACATT 


AGCACAGCCA 


6240 


GTTTAGGGTT 


GAGTCAAGAG 


TTTGCTCAAG 


GTTGGCATTT 


TAGCAGTCAA 


TTATCGGGTC 


6300 


AGTTTACTCT 


ACAAGATATA 


AGTAGCATAG 


ATTTATTCTC 


TGTAACAGGT 


ACTTATGGCG 


6360 


TCAGAGG CTT 


TAAATACGGC 


GGTGCAAGTG 


GTGAGCGCGG 


TCTTGTATGG 


CGTAATGAAT 


6420 


TAAGTATGCC 


AAAATACACC 


CG CTTTCAAA 


TCAGCCCTTA 


TGCGTTTTAT 


GATGCAGGTC 


6480 


AGTTCCGTTA 


TAATAGCGAA 


AATGCTAAAA 


CTTACGGCGA 


AGATATGCAC 


ACGGTATCCT 


6540 


CTGCGGGTTT 


AGGCATTAAA 


ACCTCTCCTA 


CACAAAACTT 


AAG CTTAGAT 


GCTTTTGTTG 


6600 


CTCGTCGCTT 


TGCAAATGCC 


AATAGTGACA 


ATTTGAATGG 


CAACAAAAAA 


CGCACAAGCT 


6660 


CACCTACAAC 


CTTCTGGGGT 


AGATTAACAT 


TCAGTTTCTA 


ACCCTGAAAT 


TTAATCAACT 


6720 


GGTAAGCGTT 


CCGCCTACCA 


GTTTATAACT 


ATATGCTTTA 


CCCGCCAATT 


TACAGTCTAT 


6780 


ACGCAACCCT 


GTTTTCATCC 


TTATATATCA 


AACAAACTAA 


GCAAACCAAG 


CAAACCAAGC 


6840 


AAACCAAGCA 


AACCAAGCAA 


ACCAAGCAAA 


CCAAGCAAAC 


CAAGCAAACC 


AAGCAAACCA 


6900 


AGCAAACCAA 


GCAAACCAAG 


CAAACCAAGC 


AAACCAAGCA 


ATGCTAAAAA 


ACAATTTATA 


6960 


TGATAAACTA 


AAACATACTC 


CATACCATGG 


CAATACAAGG 


GATTTAATAA 


TATGACAAAA 


7020 


• 

GAAAATTTAC 


AAAGTGTTCC 


ACAAAATACG 


ACCGCTTCAC 


TTGTAGAATC 


AAACAACGAC 


7080 


CAAACTTCCC 


TGCAAATACT 


TAAACAACCA 


CCCAAACCCA 


ACCTATTACG 


CCTGGAACAA 


7140 


CATGTCGCCA 


AAAAAGATTA 


TGAGCTTGCT 


TGCCGCGAAT 


TAATGGCGAT 


TTTGGAAAAA 


7200 


ATGGACGCTA 


ATTTTGGAGG 


CGTTCACGAT 


ATTGAATTTG 


ACGCACCTGC 


TCAGCTGGCA 


7260 


TATCTACCCG 


AAAAACTACT 


AATTCATTTT 


GCCACTCGTC 


TCGCTAATGC 


AATTACAACA 


7320 


CTCTTTTCCG 


ACCCCGAATT 


GGCAATTTCC 


GAAGAAGGGG 


CATTAAAGAT 


GATTAGCCTG 


7380 


CAACGCTGGT 


TGACGCTGAT 


TTTTGCCTCT 


TCCCCCTACG 


TTAACGCAGA 


CCATATTCTC 


7440 


AATAAATATA 


ATATCAACCC 


AGATTCCGAA 


GGTGGCTTTC 


ATTTAGCAAC 


AGACAACTCT 


7500 


TCTATTGCTA 


AATTCTGTAT 


TTTTTACTTA 


CCCGAATCCA 


ATGTCAATAT 


GAGTTTAGAT 


7560 
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GCGTTATGGG CAGGGAATCA ACAACTTTGT GCTTCATTGT GTTTTGCGTT GCAGTCTTCA 
CGTTTTATTG GTACTGCATC TGCGTTTCAT AAAAG AG CGG TGGTTTTACA GTGGTTTCCT 
AAAAAACTCG CCGAAATTGC TAATTTAGAT GAATTGCCTG CAAATATCCT TCATGATGTA 
TATATGCACT GCAGTTATGA TTTAGCAAAA AACAAGCACG ATGTTAAGCG TCCATTAAAC 
GAACTTGTCC GCAAGCATAT CCTCACGCAA GGATGGCAAG ACCGCTACCT TTACACCTTA 
GGTAAAAAGG ACGGCAAACC TGTGATGATG GTACTG CTTG AACATTTTAA TTCGGGACAT 
TCGATTTATC GCACGCATTC AACTTCAATG ATTGCTGCTC GAGAAAAATT CTATTTAGTC 
GGCTTAGGCC ATGAGGGCGT TGATAACATA GGTCGAGAAG TGTTTGACGA GTTCTTTGAA 
ATCAGTAGCA ATAATATAAT GGAGAGACTG TTTTTTATCC GTAAACAGTG CGAAACTTTC 
CAACCCGCAG TGTTCTATAT GCCAAGCATT GGCATGGATA TTACCACGAT TTTTGTGAGC 
AACACTCGGC TTGCCCCTAT TCAAGCTGTA GCCTTGGGTC ATCCTGCCAC TACGCATTCT 
GAATTTATTG ATTATGTCAT CGTAGAAGAT GATTATGTGG GCAGTGAAGA TTGTTTTAGC 
GAAACCCTTT TACGCTTACC CAAAGATGCC CTACCTTATG TACCATCTGC ACTCGCCCCA 
CAAAAAGTGG ATTATGTACT CAGGGAAAAC CCTGAAGTAG TCAATATCGG TATTGCCGCT 
ACCACAATGA AATTAAACCC TGAATTTTTG CTAACATTGC AAGAAATCAG AGATAAAGCT 
AAAGTCAAAA TACATTTTCA TTTCGCACTT GGACAATCAA CAGG CTTG AC ACACCCTTAT 
GTCAAATGGT TTATCGAAAG CTATTTAGGT GACGATGCCA CTGCACATCC CCACGCACCT 
TATCACGATT ATCTGGCAAT ATTG CGTGAT TGCGATATGC TACTAAATCC GTTTCCTTTC 
GGTAATACTA ACGGCATAAT TGATATGGTT ACATTAGGTT TAGTTGGTGT ATGCAAAACG 
GGGGATGAAG TACATGAACA TATTGATGAA GGTCTGTTTA AACGCTTAGG ACTACCAGAA 
TGGCTGATAG CCGACACACG AGAAACATAT ATTGAATGTG CTTTGCGTCT AGCAGAAAAC 
CATCAAGAAC GCCTTGAACT CCGTCGTTAC ATCATAGAAA ACAACGGCTT ACAAAAGCTT 
TTTACAGGCG ACCCTCGTCC ATTGGGCAAA ATACTG CTTA AGAAAACAAA TGAATGGAAG 
CGGAAGCACT TGAGTAAAAA ATAACGGTTT TTTAAAGTAA AAGTGCGGTT AATTTTCAAA 
GCGTTTTAAA AACCTCTCAA AAATCAACCG CACTTTTATC TTTATAACGC TCCCGCGCGC 
TGACAGTTTA TCTCTTTCTT AAAATACCCA TAAAATTGTG GCAATAGTTG GGTAATCAAA 
TTCAATTGTT GATACGGCAA ACTAAAGACG GCGCGTTCTT CGGCAGTCAT C 
(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 9323 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



7620 
7680 
7740 
7800 
7860 
7920 
7980 
8040 
8100 
8160 
8220 
6280 
8340 
8400 
8460 
8520 
8580 
8640 
8700 
8760 
8820 
8880 
8940 
9000 
9060 
9120 
9171 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 6 : 

CGCCACTTCA ATTTTGGATT GTTGAAATTC AACTAACCAA AAAGTGCGGT TAAAATCTGT 6 0 

GGAGAAAATA GGTTGTAGTG AAGAACGAGG TAATTGTTCA AAAGGATAAA GCTCTCTTAA 120 

TTGGGCATTG GTTGGCGTTT CTTTTTCGGT TAATAGTAAA TTATATTCTG GACGACTATG 18 0 

CAATCCACCA ACAACTTTAC CGTTGGTTTT AAGCGTTAAT GTAAG TTCTT GCTCTTCTTG 24 0 

GCGAATACGT AATCCCATTT TTTGTTTAGC AAGAAAATGA TCGGGATAAT CATAATAGGT 3 00 

GTTGCCCAAA AATAAATTTT GATGTTCTAA AATCATAAAT TTTGCAAGAT ATTGTGGCAA 360 

TTCAATACCT ATTTGTGGCG AAATCGCCAA TTTTAATTCA ATTTCTTGTA GCATAATATT 420 

TCCCACTCAA ATCAACTGGT TAAATATACA AGATAATAAA AATAAATCAA GATTTTTGTG 48 0 

ATGACAAACA ACAATTACAA CACCTTTTTT GCAGTCTATA TGCAAATATT TTAAAAAAAT 54 0 

AGTATAAATC CGCCATATAA AATGGTATAA TCTTTCATCT TTCATCTTTC ATCTTTCATC 600 

TTTCATCTTT CATCTTTCAT CTTTCATCTT TCATCTTTCA TCTTTCATCT TTCATCTTTC 660 

ATCTTTCATC TTTCATCTTT CACATGAAAT GATGAACCGA GGGAAGGGAG GGAGGGGCAA 720 

GAATGAAGAG GGAGCTGAAC GAACGCAAAT GATAAAGTAA TTTAATTGTT CAACTAACCT 780 

TAGGAGAAAA TATGAACAAG ATATATCGTC TCAAATTCAG CAAACG CCTG AATGCTTTGG 84 0 

TTGCTGTGTC TGAATTGGCA CGGGGTTGTG ACCATTCCAC AGAAAAAGGC AGCGAAAAAC 900 

CTGCTCG CAT GAAAGTGCGT CACTTAGCGT TAAAGCCACT TTCCG CTATG TTACTATCTT 960 

TAGGTGTAAC ATCTATTCCA CAATCTGTTT TAGCAAGCGG CAATTTAACA TCGACCAAAA 1020 

TGAAATGGTG CAG TTTTT AC AAGAAAACAA GTAATAAAAC CATTATCCGC AACAGTGTTG 108 0 

ACGCTATCAT TAATTGGAAA CAATTTAACA TCGACCAAAA TGAAATGGTG CAG TTTTT AC 1140 

AAGAAAACAA CAACTCCGCC GTATTCAACC GTGTTACATC TAAC CAAATC TCC CAATTAA 1200 

AAGGGATTTT AGATTCTAAC GGACAAGTCT TTTTAATCAA CCCAAATGGT ATCACAATAG 1260 

GTAAAGACGC AATTATTAAC ACTAATGGCT TTACGGCTTC TACGCTAGAC ATTTCTAACG 1320 

AAAACATCAA GGCGCGTAAT TTCACCTTCG AGCAAACCAA AGATAAAGCG CTCGCTGAAA 1380 

TTGTGAATCA CGGTTTAATT ACTGTCGGTA AAGACGGCAG TGTAAATCTT ATTGGTGGCA 1440 

AAGTGAAAAA CGAGGGTGTG ATTAGCGTAA ATGGTGGCAG CATTTCTTTA CTCGCAGGGC 1500 

AAAAAATCAC CATCAGCGAT ATAATAAACC CAACCATTAC TTACAGCATT GCCGCGCCTG 156 0 

AAAATGAAGC GGTCAATCTG GGCGATATTT TTGCCAAAGG CGGTAACATT AATGTCCGTG 1620 

CTGCCACTAT TCGAAACCAA GGTAAACTTT CTGCTGATTC TGTAAGCAAA GATAAAAGCG 168 0 

GCAATATTGT TCTTTCCGCC AAAGAGGGTG AAGCGGAAAT TGGCGGTGTA ATTTCCGCTC 1740 

AAAATCAGCA AGCTAAAGGC GGCAAGCTGA TGATAAAGTC CGATAAAGTC ACATTAAAAA 1800 

CAGGTGCAGT TATCGACCTT TCAGGTAAAG AAGGGGGAGA AACTTACCTT GGCGGTGACG 1860 

AGCGCGGCGA AGGTAAAAAC GGCATTCAAT TAG CAAAG AA AACCTCTTTA GAAAAAGGCT 1920 

CAACCATCAA TGTATCAGGC AAAGAAAAAG GCGGACGCGC TATTGTGTGG GGCGATATTG 1980 
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«^*rpHT*7V TV T"T»0 TV 

CGTTAA1 I uA 


PPPP7A TaT'TATT' 


TA TA P*T!PTP TA TA fl 


d*v ta f2 r mfi r rr' ta 

\3 X H\3 X VjVj X uA 


TTA TPOPTTA A TA 
X AX vjtw lAHA 


ALLbblbui X 




TTGTGtsAwAL. 


tv r r t f~'fn~innr*Ti f T' 
A x Luuvjuuii 


T TA TTT TA T PP TA 
XAX X lAiCUi 


T TO TA P A f2 P TA TA 


T'OPTA 7A T' I 'P* ! " 1 1 
X w^^A/iX lull 


TA7ATATAPTATA7ATAfi 






TA fl TA P P r*TrZ TA T 


(2 TA TA TA P 7A 7A 




AfiAPPPPPTT 
AvinCv.v.v.v. x x 


PGP AATA ATA 
v,vJUvi X nn X a 


Tien 


/-v /"»/*"• /"*» 1 » T\ T 1 TV TV TV 


T^I TA TP. TA TA TTP 


pr» Tv tv p tv rzrz P TA 
V. LnnLHuu v~~H. 


pp<nrz r mTA ta p.p 


A^PPfiAPPPT 


A A A A A A A ATA 




/■* tv tv f**pp A TV 


a 7A PTA TA Pf2PT7A 


TA P P TA TA T TA P TA 7A 


/'^^I'TA f P f rTPri TV TV 
W XnX X X UHnn 


TTATPTOAAA 

X X n X w X Urt/ltt 


A A PO PP Tfifi A 


^ ^ o yj 


P TA 7A TA A T TA T 


7A APf2P.PATPa 


A f2 TA TA TA TA PTTA 


rPflT'P A AT AO 


PTPAATPAAC 

L x win x v»nn%_ 


ATCGGAAGPA 

WWWfirtwVi. r\ 


^ o *± VJ 


aptpppaptt 

•M.V-. X LLwiV. 4 X 


AATTPTPPAT 


X /VPVAVJV5 X ^» 


AGPGTGGCGG 


AGGCGTTCAG 


ATTGATGGAG 


2400 


A T TA 'l*T^7A / "f M I*P 


TAAAGRPOGA 
x nnnOv3V*vjvjn 


TATATTXAAPPA 

nnl X X nnV<vn 


TTTATTCTGG 


CGGATGGGTT 

wwwn a www a a 


GATGTTCATA 


2460 


tv tv tv tv T & TT 7A P 


CIPTTfiATPAG 


w\3 X X X X x x/tn 


AT ATT AP PP1P 
Aini xnv.v<vjv> 


PGPTTPCGTA 


GCTTTTGAAG 


2520 


f2TY2f2TA TA TATTA TA 


PTA 7V7\<T! TV r*<riP 


f2TA P/"2PfZf2P A A 


ATfSPTAAAAT 


TOTPGPPPAG 


GGPAPTGTAA 


ocon 

O w 


CCAilACAGG 


tv tv r*r*f> TV TV T\ TV 
AGAv*w(xAAAA 




PTTA TV r^TV TV PPT* 


TA 'T 1 f'M*'l*T\ TV TV ^* 

Ax 111 X AAAL 


P-OTA 7A POf2/2TTA 




AAGGTCTGAA 


TATCATTTCA 


TCAGTGAATA 


ATTTAAC C CA 


CAATCTTAGT 


r^rrr^TV r*«TV TV »I"I'TV 

GbuALAATxA 


O "T rt ft 

2700 


ACATATCTGG 


n tv tv ^tttv fnn tv ^^t\ 

GAATATAACA 


TV mfPIV TV ^^<TV TV TV 

ATT AAC CAAA 


r^T^TK F*r+ TV ^* TV TV TV 

CTACGAGAAA 


bAAuALL X (_w 


TATTGG CAAA 


^ / bU 


0m,0mm\ 0*%0m\0mm\ FH/^ ft 

CCAGCCATGA 


TTCGCACTGG 


AACGTCAGTG 


CTCTTAATCT 


TV^TVOTV^TV^^/^ 

AGAGACAGGw 


G CAAAT TTTA 


O O O A 

282 0 


CCTTTATTAA 


ATACATTTCA 


AG CAATAGCA 


AAGG CTTAAC 


TV TV f* TV TV <^T>T\ 

AACACAG X A I 


7A ^* TV TV prwrPTP 


*5 O O A 

2oo 0 


^*ift ^^^"i ^^^^^wi^i ft m 

CAGGGGTGAA 


TTTTAACGGC 


GTAAATGGCA 


ACATGTCATT 


^itv tv m^m^Tv TV TV 

CAATCTCAAA 


TV TV ^» /""» TV /"» /"t^T TA 

GAAGGAGCGA 


2940 


« ft ^imm* ft f¥fe#v*m 

AAGTTAATTT 


ft ft ft frnfrnft * * * 

CAAATTAAAA 


CCAAACGAGA 


ACATGAACAC 


AAG CAAACCT 


TTAC CAATTC 


3000 


GGTTTTTAGC 


CAATATCACA 


GCCACTGGTG 


GGGGCTCTGT 


"X"X"i"i"X"X"X"GAT 


ATATATGCCA 


3060 


ACCATTCTGG 


0m ft "ft 0m 0m 0m 0m fw i 

CAGAGGGGCT 


GAGTTAAAAA 


TGAGTGAAAT 


m tv tv fn tv wm^^n^^^n 

TAATATCTCT 


tv tv f^r^r^ r^r^ / irii tv 

AACGGCGCTA 


3120 


ATTTTACCTT 


AAATTCCCAT 


GTTCGCGGCG 


ATGACGCTTT 


TAAAATCAAC 


TV TV TV TV / III Mil TV TA 

AAAGACTTAA 


3X80 


0m\ <ft m ft ft ft 

CCATAAATGC 


m ft ^^^^ft ft ^n^n^l 

AAC C AATT CA 


AATTTCAG CC 


TCAGACAGAC 


^ TV TV TV TV fy^r T\ rry 

GAAAGATGAT 


TTTTATGACG 


■S O A A 

324 0 


GGTACGCACG 


0m ft ft m^ 1 ^ ft fvi 

CAATGCCATC 


AATTCAACCT 


ACAACATATC 


CATTCTGGGC 


GGTAATGTCA 


3300 


CCCTTGGTGG 


ft ^ift ft ft ft ^*im^«ft 

ACAAAACTCA 


ft 0m 0m ft ^^ft ft 

AG CAG CAG CA 


fflifl ft AM/NMA ft ft 

TTACGGGGAA 


VTITt » 1 ll't 1 TV ^ <^ tTV ^n^l 

TATTACTATC 


^■•TA TV TV TV ^T <TTV ^T 

GAG AAAG CAG 


3360 


0mm\ ^fc ft m/imfn 

CAAATGTTAC 


GCTAGAAGCC 


AATAACGCCC 


CTAATCAGCA 


TA TV TV O TV *r»TV TV 

AAACATAAGG 


^1T\ fTTTVOTV/ i n , *T<TV 

GATAGAGTTA 


1 » *V A 

342 0 


TAAAACTTGG 


CAGCTTGCTC 


GTTAATGGGA 


GTTTAAGTTT 


TV TV /"*n"*/T"» TV TV 

AACTGwCGAA 


TV TV rnp p»p« fTT Tv 


<J4o U 


TTAAAGGCAA 


TCTCACTATT 


TCAGAAAG CG 


CCACTTTTAA 


T\ ^ ^ TV TV TV in TV <*MI 

AGGAAAGACT 


TV f* TV ^ TV »TTTV ^tf"!^* 

AGAGATACC C 


■v r i a 

354 0 


TAAATATCAC 


CGGCAATTTT 


ft ^m.0m-mv ft m ft ft 

ACCAATAATG 


GCACTGCCGA 


tv tv f i if i v tv Tv rri tv rri tv 

AATTaATATA 


TV f TV ^TV TA f*F* TV 

ACACAAGGAG 


3600 


TGGTAAAACT 


TGGCAATGTT 


ft 0m0m'm\ ft ft 

ACCAATGATG 


ft fw|f*im ft ft ft 

GTGATTTAAA 


CATTACCACT 


^TA ^^f*TTlTA TV TV f~i 

CACGCTAAAC 


3660 


GCAACCAAAG 


AAGCATCATC 


ft ft w* 

GGCGGAGATA 


m ft ft ft ft ft ft 

TAATCAACAA 


TV TV TV TV ^T TV TV ^t^T 

AAAAGGAAGC 


TTAAATATTA 


3 720 


CAGACAGTAA 


TAATGATGCT 


GAAATCCAAA 


TTGGCGGCAA 


TATCTCGCAA 


AAAG AAGG CA 


3780 


ACCTCACGAT 


TTCTTCCGAT 


AAAATTAATA 


TCACCAAACA 


GATAACAATC 


AAAAAGGGTA 


3840 


TTGATGGAGA 


GGACTCTAGT 


TCAGATGCGA 


CAAGTAATGC 


CAACCTAACT 


ATTAAAACCA 


3900 


AAGAATTGAA 


ATTGACAGAA 


GACCTAAGTA 


TTTCAGGTTT 


CAATAAAGCA 


GAGATTACAG 


3960 


CCAAAGATGG 


TAGAGATTTA 


ACTATTGGCA 


ACAGTAATGA 


CGGTAACAGC 


GGTGCCGAAG 


4020 
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CCAAAACAGT AACTTTTAAC AATGTTAAAG ATTCAAAAAT CTCTGCTGAC GGTCACAATG 
TGACACTAAA TAG CAAAGTG AAAACATCTA GCAGCAATGG CGGACGTGAA AGCAATAGCG 
ACAACGATAC CGGCTTAACT ATTACTGCAA AAAATGTAGA AGTAAACAAA GATATTACTT 
CTCTCAAAAC AGTAAATATC ACCGCGTCGG AAAAGGTTAC CACCACAGCA GGCTCGACCA 
TTAACG CAAC AAATGGCAAA GCAAGTATTA CAACCAAAAC AGGTGATATC AGCGGTACGA 
TTTCCGGTAA CACGGTAAGT GTTAG CGCGA CTGGTGATTT AACCACTAAA TCCGGCTCAA 
AAATTGAAGC GAAATCGGGT GAGGCTAATG TAACAAGTGC AACAGGTACA ATTGGCGGTA 
CAATTTCCGG TAATACGGTA AATGTTACGG CAAACGCTGG CGATTTAACA GTTGGGAATG 
GCGCAGAAAT TAATGCGACA GAAGGAGCTG CAACCTTAAC CGCAACAGGG AATACCTTGA 
CTACTGAAGC CGGTTCTAGC ATCACTTCAA CTAAGGGTCA GGTAGACCTC TTGGCTCAGA 
ATGGTAG CAT CGCAGGAAGC ATTAATG CTG CTAATGTGAC ATTAAATACT ACAGGCACCT 
TAACCACCGT GGCAGGCTCG GATATTAAAG CAACCAGCGG CACCTTGGTT ATTAACGCAA 
AAGATGCTAA GCTAAATGGT GATGCATCAG GTGATAGTAC AGAAGTGAAT GCAGTCAACG 
ACTGGGGATT TGGTAGTGTG ACTGCGGCAA CCTCAAGCAG TGTGAATATC ACTGGGGATT 
TAAACACAGT AAATGGGTTA AATATCATTT CGAAAGATGG TAGAAACACT GTGCGCTTAA 
GAGGCAAGGA AATTGAGGTG AAATATATCC AGCCAGGTGT AGCAAGTGTA GAAGAAGTAA 
TTGAAGCGAA ACGCGTCCTT GAAAAAGTAA AAGATTTATC TGATGAAGAA AGAGAAACAT 
TAGCTAAACT TGGTGTAAGT GCTGTACGTT TTGTTGAGCC AAATAATACA ATTACAGTCA 
ATACACAAAA TGAATTTACA ACCAGACCGT CAAGTCAAGT GATAATTTCT GAAGGTAAGG 
CGTGTTTCTC AAGTGGTAAT GGCGCACGAG TATGTACCAA TGTTGCTGAC GATGGACAGC 
CGTAGTCAGT AATTGACAAG GTAGATTTCA TCCTGCAATG AAGTCATTTT ATTTTCGTAT 
TATTTACTGT GTGGGTTAAA GTTCAGTACG GGCTTTACCC ATCTTGTAAA AAATTACGGA 
GAATACAATA AAGTATTTTT AACAGGTTAT TATTATGAAA AATATAAAAA GCAGATTAAA 
ACTCAGTGCA ATATCAGTAT TGCTTGGCCT GGCTTCTTCA TCATTGTATG CAGAAGAAGC 
GTTTTTAGTA AAAGGCTTTC AGTTATCTGG TGCACTTGAA ACTTTAAGTG AAGACGCCCA 
ACTGT CTGT A GCAAAATCTT TATCTAAATA CCAAGGCTCG CAAACTTTAA CAAACCTAAA 
AACAG CACAG CTTGAATTAC AGG CTGTGCT AGATAAGATT GAGCCAAATA AATTTGATGT 
GATATTGCCG CAACAAACCA TTACGGATGG CAATATCATG TTTGAGCTAG TCTCGAAATC 
AGCCG CAGAA AGCCAAGTTT TTTATAAGGC GAGCCAGGGT TATAGTGAAG AAAATATCGC 
TCGTAGCCTG CCATCTTTGA AACAAGGAAA AGTGTATGAA GATGGTCGTC AGTGGTTCGA 
TTTGCGTGAA TTTAATATGG CAAAAGAAAA CCCGCTTAAG GTTACCCGTG TACATTACGA 
ACTAAACCCT AAAAACAAAA CCTCTAATTT GATAATTGCG GGCTTCTCGC CTTTTGGTAA 
AACGCGTAGC TTTATTTCTT ATGATAATTT CGG CGCGAGA GAGTTTAACT ACCAACGTGT 
AAG CTTGGGT TTTGTTAATG CCAATTTAAC TGGTCATGAT GATGTGTTAA TTATACCAGT 



4080 
4140 
4200 
4260 
4320 
4380 
4440 
4500 
4560 
4620 
4680 
4740 
4800 
4860 
4920 
4980 
5040 
5100 
5160 
5220 
5280 
5340 
5400 
5460 
5520 
5580 
5640 
5700 
5760 
5820 
5880 
5940 
6000 
6060 
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ATGAGTTATG CTGATTCTAA TGATATCGAC GGCTTACCAA GTGCGATTAA TCGTAAATTA 
TCAAAAGGTC AATCTATCTC TGCGAATCTG AAATGGAGTT ATTATCTCCC AAGATTTAAC 
CTTGGCATGG AAGACCAATT TAAAATTAAT TTAGGCTACA ACTACCGCCA TATTAATCAA 
ACCTCCGCGT TAAATCGCTT GGGTGAAACG AAGAAAAAAT TTGCAGTATC AGG CGTAAGT 
GCAGGCATTG ATGGACATAT CCAATTTACC CCTAAAACAA TCTTTAATAT TGATTTAACT 
CAT CATTATT ACGCGAGTAA ATTACCAGGC TCTTTTGGAA TGGAGCGCAT TGGCGAAACA 
TTTAATCGCA GCTATCACAT TAGCACAGCC AGTTTAGGGT TGAGTCAAGA GTTTGCTCAA 
GGTTGGCATT TTAGCAGTCA ATTATCAGGT CAATTTACTC TACAAGATAT TAGCAGTATA 
GATTTATTCT CTGTAACAGG TACTTATGGC GTCAGAGGCT TTAAATACGG CGGTGCAAGT 
GGTGAGCGCG GTCTTGTATG GCGTAATGAA TTAAGTATGC CAAAATACAC CCGCTTCCAA 
ATCAGCCCTT ATGCGTTTTA TGATGCAGGT CAGTTCCGTT ATAATAGCGA AAATGCTAAA 
ACTTACGGCG AAGATATGCA CACGGTATCC TCTGCGGGTT TAGGCATTAA AACCTCTCCT 
ACACAAAACT TAAGCCTAGA TGCTTTTGTT GCTCGTCGCT TTGCAAATGC CAATAGTGAC 
AATTTGAATG GCAACAAAAA ACGCACAAGC TCACCTACAA CCTTCTGGGG GAGATTAACA 
TTCAGTTTCT AACCCTGAAA TTTAATCAAC TGGTAAGCGT TCCGCCTACC AGTTTATAAC 
TATATGCTTT ACCCGCCAAT TTACAGTCTA TAGG CAACC C TGTTTTTACC CTTATATATC 
AAATAAACAA GCTAAGCTGA GCTAAGCAAA CCAAGCAAAC TCAAGCAAGC CAAG TAAT AC 
TAAAAAAACA ATTTATATGA TAAACTAAAG TATACTCCAT GCCATGGCGA TACAAGGGAT 
TTAATAATAT GACAAAAGAA AATTTGCAAA ACGCTCCTCA AGATGCGACC GCTTTACTTG 
CGGAATTAAG CAACAATCAA ACTCCCCTGC GAATATTTAA ACAACCACGC AAGCCCAGCC 
TATTACGCTT GGAACAACAT ATCGCAAAAA AAGATTATGA GTTTGCTTGT CGTGAATTAA 
TGGTGATTCT GGAAAAAATG GACGCTAATT TTGGAGGCGT TCACGATATT GAATTTGACG 
CACCCGCTCA GCTGGCATAT CTACCCGAAA AATTACTAAT TTATTTTGCC ACTCGTCTCG 
CTAATGCAAT TACAACACTC TTTTCCGACC CCGAATTGGC AATTTCTGAA GAAGGGGCGT 
TAAAGATGAT TAGCCTGCAA CGCTGGTTGA CGCTGATTTT TGCCTCTTCC CCCTACGTTA 
ACGCAGACCA TATTCTCAAT AAATATAATA TCAACCCAGA TTCCGAAGGT GGCTTTCATT 
TAGCAACAGA CAACTCTTCT ATTGCTAAAT TCTGTATTTT TTACTTACCC GAATC CAATG 
TCAATATGAG TTTAGATGCG TTATGGGCAG GGAATCAACA ACTTTGTGCT TCATTGTGTT 
TTGCGTTGCA GTCTTCACGT TTTATTGGTA CCGCATCTGC GTTTCATAAA AGAGCGGTGG 
TTTTACAGTG GTTTCCTAAA AAACTCGCCG AAATTGCTAA TTTAGATGAA TTGCCTGCAA 
ATATCCTTCA TGATGTATAT ATGCACTGCA GTTATGATTT AGCAAAAAAC AAGCACGATG 
TTAAGCGTCC ATTAAACGAA CTTGTCCGCA AG CAT ATC CT CACGCAAGGA TGGCAAGACC 
GCTACCTTTA CACCTTAGGT AAAAAGGACG GCAAACCTGT GATGATGGTA CTGCTTGAAC 
ATTTTAATTC GGGACATTCG ATTTATCGTA CACATTCAAC TTCAATGATT GCTGCTCGAG 
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7140 
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AAAAATTCTA 


TTTAGTCGGC 


TTAGGCCATG 


AGGGCGTTGA 


TAAAATAGGT 


CGAGAAGTGT 


8160 


TTGACGAGTT 


CTTTGAAATC 


AGTAGCAATA 


ATATAATGGA 


GAGACTGTTT 


TTTATCCGTA 


6220 


AACAGTGCGA 


AACTTTCCAA 


CCCGCAGTGT 


TCTATATGCC 


AAG CATTGGC 


ATGGATATTA 


8280 


CCACGATTTT 


TGTGAGCAAC 


ACTCGGCTTG 


CCCCTATTCA 


AGCTGTAGCC 


CTGGGTCATC 


8340 


CTGCCACTAC 


GCATTCTGAA 


TTTATTGATT 


ATGTCATCGT 


AGAAGATGAT 


TATGTGGGCA 


8400 


GTGAAGATTG 


TTTCAGCGAA 


ACCCTTTTAC 


GCTTACCCAA 


AGATGCCCTA 


CCTTATGTAC 


8460 


CTTCTGCACT 


CGCCCCACAA 


AAAGTGGATT 


ATGTACTCAG 


GGAAAACCCT 


GAAGTAGTCA 


8520 


ATATCGGTAT 


TGCCGCTACC 


ACAATGAAAT 


TAAACCCTGA 


ATTTTTG CT A 


ACATTGCAAG 


8580 


AAATCAGAGA 


TAAAGCTAAA 


GTCAAAATAC 


ATTTTCATTT 


CGCACTTGGA 


CAATCAACAG 


8640 


GCTTGACACA 


CCCTTATGTC 


AAATGGTTTA 


TCGAAAGCTA 


TTTAGGTGAC 


GATGCCACTG 


8700 


CACATCCCCA 


CGCACCTTAT 


CACGATTATC 


TGG CAAT ATT 


GCGTGATTGC 


GATATGCTAC 


8760 


TAAATCCGTT 


TCCTTTCGGT 


AATACTAACG 


GCATAATTGA 


TATGG TTACA 


TTAGGTTTAG 


8820 


TTGGTGTATG 


CAAAACGGGG 


GATGAAGTAC 


ATGAACATAT 


TGATGAAGGT 


CTGTTTAAAC 


8880 


GCTTAGGACT 


ACCAGAATGG 


CTGATAGCCG 


ACACACGAGA 


AACATATATT 


GAATGTGCTT 


8940 


TGCGTCTAGC 


AGAAAACCAT 


CAAGAACGCC 


TTGAACTCCG 


TCGTTACATC 


ATAGAAAACA 


9000 


ACGGCTTACA 


AAAGCTTTTT 


ACAGGCGACC 


CTCGTCCATT 


GGGCAAAATA 


CTGCTTAAGA 


9060 


AAACAAATGA 


ATGGAAGCGG 


AAG CACTTG A 


GTAAAAAATA 


ACGG'rrrm' 


AAAGTAAAAG 


9120 


TGCGGTTAAT 


TTTCAAAGCG 


TTTTAAAAAC 


CTCTCAAAAA 


TCAACCGCAC 


TTTTATCTTT 


9180 


ATAACGATCC 


CG CACG CTG A 


CAGTTTATCA 


GCCTCCCGCC 


ATAAAACTCC 


GCCTTTCATG 


9240 


GCGGAGATTT 


TAGCCAAAAC 


TGGCAGAAAT 


TAAAGGCTAA 


AATCACCAAA 


TTGCACCACA 


9300 


AAATCACCAA 


TACCCACAAA 


AAA 








9323 



(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 4287 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

GATCAATCTG GGCGATATTT TTGCCAAAGG TGGTAACATT AATGTCCGCG CTGCCACTAT 60 

TCGCAATAAA GGTAAACTTT CTGCCGACTC TGTAAGCAAA GATAAAAGTG GTAACATTGT 120 

TCTCTCTGCC AAAGAAGGTG AAGCGGAAAT TGG CGGTGTA ATTTCCGCTC AAAATCAGCA 180 

AGCCAAAGGT GGTAAGTTGA TGATTACAGG CGATAAAGTT ACATTGAAAA CGGGTGCACT 240 

TATCGACCTT TCGGGTAAAG AAGGGGGAGA AACTTATCTT GGCGGTGACG AGCGTGGCGA 300 

AGGTAAAAAC GGCATTCAAT TAGCAAAGAA AACCACTTTA GAAAAAGGCT CAACAATTAA 360 
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TGTGTCAGGT 


AAAGAAAAAG 


CTGGGCGCGC 


TATTGTATGG 


GG CGATATTG 


CGTTAATTGA 


42 0 


CGGCAATATT 


AATGCC CAAG 


GTAAAGATAT 


CGCTAAAACT 


GGTG GTTTTG 


TGGAGACGTC 


480 


GGGGCATTAC 


TTATCCATTG 


ATGATAACGC 


AATTGTTAAA 


ACAAAAGAAT 


GGCTACTAGA 


54 0 


CCCAGAGAAT 


GTGACTATTG 


AAG C T C PTTC 

X WW X X W- 


PGPTTpTpGP 

W\JW X X X L\9\« 


GTCGAGCTGG 


GTGC CGATAG 

VJ X V3L Wwli X 


Q \J \J 


GAATTCCCAC 


TCGGCAGAGG 


TGATAAAAGT 


GACCCTAAAA 


AAAAATAACA 


PPTPPTTGAP 


66 0 

O D \J 


AACACTAACC 


AATACAACCA 


TTTPAAATPT 


TCTGAAAAGT 


GCCCACGTGG 


TGAACATAAP 




GGCAAGGAGA 


AAACTTACCG 


TTAATAGCTC 


TATCAGTATA 


GAAAGAGGCT 


CCCACTTAAT 


780 


TCTCCACAGT 


GAAGGTCAGG 


GCGGTCAAGG 


TGTTCAGATT 


GATAAAGATA 


TTACTTCTGA 


840 


AGGCGGAAAT 


TTAACCATTT 


ATTCTGGCGG 


ATGGGTTGAT 


GTTCATAAAA 


ATATTACGCT 


900 


TGGTAGCGGC 


TTTTTAAACA 


TCACAACTAA 


AGAAGGAGAT 


ATCGCCTTCG 


AAGACAAGTC 


960 


TGGACGGAAC 


AACCTAACCA 


TTACAGCCCA 


AGGG AC CATC 


ACCTCAGGTA 

X wX%w\3 X f% 


ATAGTAACGG 

X AV3 X rtrtv»\J\J 


1 05 o 


^^^p^p^Pft.^ift ^p^p-^p 

LI 1 X AvxaX X X 


a a pa a pvstpt 




LL X X VjVjtL wwA 


AAuL X bAuL X 


X X AL X wALAu 


lUbU 




A fZ A GGT A G A A 
AwAVjo X A VJAM 


f2A APTA AGl"xf2 


lAAiAl L X La 


ft. ft ft ft ft 

AALAAA I lib 


AL uuAALo 1 1 


114 U 


AAALAX X X LL 


fZG A A PTGT AG 


■ A •r' ft »rv*rn ^ ^ ft ^p 


fi! A A A f2f* A P f P 
UAAAuUiL L L 


AAAw X LAbL X 


tatj XXI lALAkl 


lzUU 




CGPACPTAPT 




PAP'I ,, | ,, | , AAA*P 

LALX X XAAAX 


riTTii ppTPnn 

w X X ALL X L13V3 


O X AU X AAA X X 


1 Oiff ft 
l^QU 


TAAPPTPTPP 


ATTGAPAGCA 


PAGGAAGTGG 


PTPAAPAGGT 

L- X L»nA^nAV3w X 


PPAAGPATAP 

LLAAwLAiAL 


GPA ATGPAGA 
ULAAl V3LAV3A 


i "inn 


ATTAAATGGC 


ATAApATTTA 


ATAAAGPPAP 


TTTT A A *T* A TP 

XXX XAAXx%XL 


G PAP A A GG PT 


P A APAGPTA A 

LAALAUL X AA 


TIRO 


pttt & g p atp 


AAGGPATPAA 


TlMiT^PPPTT 

X./'l/^X w w WW X X 


TAAGAGTAAP 

X AfivAU X AAL 


GPTAAPTAPG 

OL X AAL X ALO 


P ATTATTTA A 
LAX X AX X lAA 


X44U 


TGAAGATATT 

X VJivlwn X X X 


X Law X w X V«^/"VVj 


GGGGGGGTAG 

UwUUuOu X .Aw 


LOX 1AA1 XXL 


AAAPTTAAPG 
AAAL X X AALU 


PP'I'P A TP'PJi^ 
LL X LA X L X AVj 


t con' 


CAAPATAPAA 


ACCCCTGGCG 


T A A'l"l 1 ATA A A 


ATPTPAAAAP 

nl V_ X Im-AAAAL 


TTT A A Tti'i "f "i " 
X X X AA.X w X L X 


PAGGAGGGTP 
LAVJuAuuV) X L 




AACTTTAAAT 
**** w xxx mn x 


CTCAAGGCTG 


AAGGTTCAAC 

A*A WW X X ^~Af%W 


AG AAA P PG PT 

Aw/VlAVivVSW X 


TTTTPARTAG 

X X X X winlnVS 


TV TV ft TV'it ^Tin^ , l* l I l 
•nnnnxUnl X X 


t con 


aaav» X X nnnu 


G P PACCGGTG 

W^a-> VavaT^V** WWW X w 


GPAJVTATAJXP 
wLAA X A X AAL. 


AATPAGAPAA 

AAX LAUALAA 


GTPGAGGGTA 
\3 X LUAUUU X A 


PPGATTPAPfZ 
LLUAl XLALV3 


i c q n 


PGT P A A P IV A A 
vu x LAAL^iAA 


ww X w X w w w-Aw 


OP* A AAA A A A A 
LLAAAAAAAA 


LAX AAL X X X X 


A A AGfSl^GGT A 
AAnvJwVjuvJ X a 


T\ * i T TV TV POTT 

AX AX LA.LL X X 


1 /4 o 




A A AGPPAPA A 


LiiuiiiiAi Lii/i 


AuuLAAlwi X 


t\ pptv *pr , 7v iti 

ALLAl LaaX a 


ft ft ft T\ f**»TV O'T'TV ft 
AAAALAL X aa 


18 OO 


PGPTAr*TPTT 

v-oL X AL 1L.11 


Lw X «u X UWUA 


n.1 X X XwwLwA 


JililPZlJiTiTPP 
aaaLaaaX LVa 


LLX X XaaaXa 


rp 7v /-t Tv C*(~* ft TV TV 
X Aw LAUUAAA 




TGTT A TT'Ti RHP 
lul Inl 1/uil 


MiTfinpiiiiPP 


X lALLnWloL 


LVabLlcLAl X 


aX LaaX aX au 


Pr/V^ ft TV T'OT 1 

L LuuAAATlT 


X920 


T A PTG TTT P A 
X AW X w X X X LA 


AAAGGPGPTA 
■AAAUwLwL X«rl 


APPTTPaaGP 

ALU X X wvioL 


TATAAPA A AT 
X A X AALAAA X 


T* A P A P*l*TTPnP A 
X ALAL X X X X A 


A TGT A G PPG/2 
AX Vj lAbLLLrb 


1 Qfi f\ 


CTPATTTRAP 

V« X XXX wX* L 




PTT P AA A P A T 

W X X LAAALAl 


TTPPATTGPP 
X XLLAX XwLL 


AGAGGAGGGG 


PTA A A TTT A A 
LXAAAX X X AA 




AGATATpAAT 
jnwi x x win x 


AAPAPCAGTA 


G CTTAAATAT 


TAP PAPPAAP 


TPTG ATAP PA 


PTTAPPGPAP 
LX Jl nL L>U Lnw 


5 1 Oft 


CATTATAAAA 


GGCAATATAT 


CCAACAAATC 


AGGTGATTTG 


AATATTATTG 


ATAAAAAaAG 


2160 


CGACGCTGAA 


ATCCAAATTG 


GCGGCAATAT 


CTCACAAAAA 


GAAGGCAATC 


TCAC71ATTTC 


2220 


TTCTGATAAA 


GTAAATATTA 


C CAATCAG AT 


AACAATCAAA 


GCAGGCGTTG 


AAGGGGGGCG 


2280 


TTCTGATTCA 


AGTGAGGCAG 


AAAATGCTAA 


CCTAACTATT 


CAAACCAAAG 


AGTTAAAATT 


2340 


GGCAGGAGAC 


CTAAATATTT 


CAGGCTTTAA 


TAAAGCAGAA 


ATTACAGCTA 


AAAATGGCAG 


2400 
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TGATTTAACT ATTGGCAATG CTAGCGGTGG TAATGCTGAT GCTAAAAAAG TGACTTTTGA 24 6 0 

CAAGGTTAAA GATTCAAAAA TCTCGACTGA CGGTCACAAT GTAACACTAA ATAGCGAAGT 2 52 0 

GAAAACGTCT AATGGTAGTA GCAATGCTGG TAATGATAAC AGCACCGGTT TAACCATTTC 25 8 0 

CGCAAAAGAT GTAACGGTAA AC AATAACG T TACCTCCCAC AAGACAATAA ATATCTCTGC 264 0 

CGCAGCAGGA AATGTAkCAA CCAAAGAAGG CACAACTATC AATGCAACCA CAGGCAGCGT 2 7 00 

GGAAGTAACT GCTCAAAATG GTACAATTAA AGGCAACATT ACCTCGCAAA ATGTAACAGT 2 760 

GACAGCAACA GAAAATCTTG TTACCACAGA GAATGCTGTC ATTAATGCAA CCAGCGGCAC 282 0 

AGTAAACATT AGTACAAAAA CAGGG G ATAT TAAAGGTGGA ATTGAATCAA CTTCCGGTAA 2 880 

TGTAAATATT ACAGCGAGCG GCAATACACT TAAGGTAAGT AATATCACTG GTCAAGATGT 2940 

AACAGTAACA GCGGATGCAG GAGCCTTGAC AACTACAGCA GGCTCAACCA TTAGTGCGAC 3000 

AACAGG CAAT GCAAATATTA CAACCAAAAC AGGTGATATC AACGGTAAAG TTGAATCCAG 3 060 

CTCCGGCTCT GTAACACTTG TTGCAACTGG AGCAACTCTT GCTGTAGGTA ATATTTCAGG 3120 

TAACACTGTT ACTATTACTG CGGATAGCGG TAAATTAACC TCCACAGTAG GTTCTACAAT 3180 

TAATGGGACT AATAGTGTAA CCACCTCAAG CCAATCAGGC GATATTGAAG GTACAATTTC 324 0 

TGGTAATACA GTAAATGTTA CAGCAAGCAC TGGTGATTTA ACTATTGGAA ATAGTGCAAA 3300 

AGTTGAAGCG AAAAATGGAG CTGCAACCTT AACTGCTGAA TCAGGCAAAT TAACCACCCA 3360 

AACAGGCTCT AGCATTACCT CAAGCAATGG TCAGACAACT CTTACAGCCA AGGATAGCAG 3420 

TATCG CAGGA AACATTAATG CTG CTAATGT GACGTTAAAT ACCACAGGCA CTTTAACTAC 3480 

TACAGGGGAT TCAAAGATTA ACGCAACCAG TGGTACCTTA ACAATCAATG CAAAAGATGC 3540 

CAAATTAGAT GGTGCTGCAT CAGGTGACCG CACAGTAGTA AATGCAACTA ACGCAAGTGG 3600 

CTC TGGTAAC GTGACTGCGA AAACCTCAAG CAGCGTGAAT ATCACCGGGG ATTTAAACAC 3660 

AATAAATGGG TTAAATATCA TTTCGGAAAA TGGTAGAAAC ACTGTGCGCT TAAGAGGCAA 3720 

GGAAATTGAT GTGAAATATA TCCAACCAGG TGTAGCAAGC GTAGAAGAGG TAATTGAAGC 3780 

GAAACGCGTC CTTGAGAAGG TAAAAGATTT ATCTGATGAA GAAAGAGAAA CACTAGCCAA 3840 

ACTTGGTGTA AGTGCTGTAC GTTTCGTTGA GCCAAATAAT GCCATTACGG TTAATACACA 3900 

AAACGAGTTT ACAACCAAAC CATCAAGTCA AGTGACAATT TCTGAAGGTA AGGCGTGTTT 3960 

CTCAAGTGGT AATGGCGCAC GAGTATGTAC CAATGTTGCT GACGATGGAC AGCAGTAGTC 4020 

AGTAATTGAC AAGGTAGATT TCATCCTGCA ATGAAGTCAT TTTATTTTCG TATTATTTAC 4080 

TGTGTGGGTT AAAGTTCAGT ACGGGCTTTA CCCACCTTGT AAAAAATTAC GAAAAATACA 4140 

ATAAAGTATT TTTAACAGGT TATTATTATG AAAAACATAA AAAGCAGATT AAAACTCAGT 4200 

GCAATATCAA TATTGCTTGG CTTGGCTTCT TCATCGACGT ATGCAGAAGA AGCGTTT T TA 4260 

GTAAAAGGCT TTCAGTTATC TGGCGCG 4287 
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(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4702 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 
GGGAATGAGC GTCGTACACG GTACAGCAAC CATG CAAGTA GACGGCAATA AAACCACTAT 
CCGTAATAGC ATCAATGCTA TCATCAATTG GAAACAATTT AACATTGACC AAAATGAAAT 
GGAGCAGTTT TTACAAGAAA GCAGCAACTC TGCCGTTTTC AACCGTGTTA CATCTGACCA 
AATCTCCCAA TTAAAAGGGA TTTTAGATTC TAACGGACAA GTCTTTTTAA TCAACCCAAA 



TGGTATCACA 


ATAGGTAAAG 


ACGCAATTAT 


TAACACTAAT 


GGCTTTACTG 


CTTCTACGCT 


AGACATTTCT 


AACGAAAACA 


TCAAGGCGCG 


TAATTTCACC 


CTTGAGCAAA 


CCAAGGATAA 


AGCACTCGCT 


GAAATCGTGA 


ATCACGGTTT 


AATTACCGTT 


GGTAAAGACG 


GTAGCGTAAA 


CCTTATTGGT 


GGCAAAGTGA 


AAAACGAGGG 


CGTGATTAGC 


GTAAATGGCG 


GTAGTATTTC 


TTTACTTGCA 
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GGGCAAGGGT TTAAAGTTTA TTGCAAATCA AAATAATTTC ACTCATAAAT TTGATGGCGA 168 0 

AATTAACATA TCTGGAATAG TAACAATTAA CCAAACCACG AAAAAAGATG TTAAATACTG 174 0 

GAATGCATCA AAAGACTCTT ACTGGAATGT TTCTTCTCTT ACTTTGAATA CGGTGCAAAA 1800 

ATTTACCTTT ATAAAATTCG TTGATAGCGG CTCAAATTCC CAAGATTTGA GGTCATCACG 1860 

TAGAAGTTTT GCAGGCGTAC ATTTTAACGG CATCGGAGGC AAAACAAACT TCAACATCGG 1920 

AGCTAACGCA AAAGCCTTAT TTAAATTAAA ACCAAACGCC GCTACAGACC CAAAAAAAGA 1980 

ATTACCTATT ACTTTTAACG CCAACATTAC AGCTACCGGT AACAGTGATA GCTCTGTGAT 204 0 

GTTTGACATA CACGCCAATC TTACCTCTAG AGCTGCCGGC ATAAACATGG ATTCAATTAA 2100 

CATTACCGGC GGGCTTGACT TTTCCATAAC ATCCCATAAT CGCAATAGTA ATG CTTTTGA 2160 

AATCAAAAAA GACTTAACTA TAAATGCAAC TGGCTCGAAT TTTAGTCTTA AGCAAACGAA 2220 

AGATTCTTTT TATAATGAAT ACAGCAAACA CGCCATTAAC TCAAGTCATA ATCTAACCAT 2280 

TCTTGGCGGC AATGTCACTC TAGGTGGGGA AAATTCAAGC AGTAGCATTA CGGGCAATAT 234 0 

CAATATCACC AATAAAGCAA ATGTTACATT ACAAGCTGAC ACCAGCAACA GCAACACAGG 2400 

CTTGAAGAAA AGAACTCTAA CTCTTGGCAA TATATCTGTT GAGGGGAATT TAAGCCTAAC 2460 

TGGTGCAAAT GCAAACATTG TCGG CAATCT TTCTATTGCA GAAGATTCCA CATTTAAAGG 2520 

AGAAGCCAGT GACAACCTAA ACATCACCGG CACCTTTACC AACAACGGTA CCGCCAACAT 2580 

TAATATAAAA CAAGGAGTGG TAAAACTCCA AGGCGATATT ATCAATAAAG GTGGTTTAAA 2640 

TATCACTACT AACGCCTCAG GCACTCAAAA AACCATTATT AACGGAAATA TAACTAACGA 2700 

AAAAGGCGAC TTAAACATCA AGAATATTAA AGCCGACGCC GAAATCCAAA TTGG CGGCAA 2760 

TATCTCACAA AAAGAAGGCA ATCTCACAAT TTCTTCTGAT AAAGTAAATA TTAC CAATCA 2820 

GATAACAATC AAAGCAGGCG TTGAAGGGGG GCGTTCTGAT TCAAGTGAGG CAGAAAATGC 2880 

TAACCTAACT ATTCAAACCA AAGAGTTAAA ATTGGCAGGA GACCTAAATA TTTCAGGCTT 2940 

TAATAAAGCA GAAATTACAG CTAAAAATGG CAGTGATTTA ACTATTGGCA ATGCTAGCGG 3 000 

TGGTAATGCT GATGCTAAAA AAGTGACTTT TGACAAGGTT AAAGATTCAA AAATCTCGAC 3 060 

TGACGGTCAC AATGTAACAC TAAATAGCGA AGTGAAAACG TCTAATGGTA GTAGCAATGC 3120 

TGGTAATGAT AACAGCACCG GTTTAACCAT TTCCGCAAAA GATGTAACGG TAAACAATAA 3180 

CGTTACCTCC CACAAGACAA TAAATATCTC TGCCGCAGCA GGAAATGTAA CAACCAAAGA 3240 

AGGCACAACT ATCAATGCAA CCACAGGCAG CGTGGAAGTA ACTGCTCAAA ATGGTACAAT 3300 

TAAAGG CAAC ATTACCTCGC AAAATGTAAC AGTGACAGCA ACAGAAAATC TTGTTACCAC 3360 

AGAGAATGCT GTCATTAATG CAACCAGCGG CACAGTAAAC ATTAGTACAA AAACAGGGGA 3420 

TATTAAAGGT GGAATTGAAT CAACTTCCGG TAATGTAAAT ATTACAGCGA GCGGCAATAC 3480 

ACTTAAGGTA AGTAATATCA CTGGTCAAGA TGTAACAGTA ACAGCGGATG CAGGAGCCTT 3540 

GACAACTACA GCAGGCTCAA CCATTAGTGC GACAACAGGC AATGCAAATA TTACAACCAA 3600 

AACAGGTGAT ATCAACGGTA AAGTTGAATC CAGCTCCGGC TCTGTAACAC TTGTTGCAAC 3660 
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CLAIMS 

What we claim is: 

1. A vaccine against disease caused by non-typeable 
Haemophilus influenzae , including otitis media, sinusitis 
and bronchitis, comprising an effective amount of a high 
molecular weight protein of non-typeable Haemophilus 
influenzae which is protein HMW1, HMW2 , HMW3 or HMW4 or 
a variant or fragment of said protein retaining 
immunological properties thereof or a synthetic peptide 
having an amino acid sequence corresponding to that of 
said protein, and a physiological carrier therefor. 

2. The vaccine of claim 1 wherein said protein is HMW1 
encoded by the DNA sequence shown in Figure 1 (SEQ ID 
NO:l), having the derived amino acid sequence of Figure 
2 (SEQ ID NO: 2) and having an apparent molecular weight 
of 125 kDa. 

3. The vaccine of claim 1 wherein said protein is HMW2 
encoding by the DNA sequence shown in Figure 3 (SEQ ID 
NO: 3), having the derived amino acid sequence of Figure 
4 (SEQ ID NO: 4) and having an apparent molecular weight 
of 120 kDa. 
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